LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

FP data types provide higher model performance (e.g. lower perplexity, in context of LLMs) compared to integer data types. Here lower perplexity means the model is less “surprised” by the data and predicts it better. It is essentially tied to the model’s predictive uncertainty, often interpreted as the exponentiated average cross-entropy over the test set.

LNS to FP number conversion during accumulation of partial sums.
Optimal bit distribution for integer and fractional parts in LNS has not been studied thoroughly.

Proposed: LogFlex accelerator architecture which has 8-bit LNS Arithmetic hardware co-designed with hardware.

Follow up:

LNS for hardware accelerators: 5, 6
Quantization Strategies: 3, 7