The authors apply low-precision (8-bit) LNS, and adaptively assign bits for the integer and fraction depending on the data distribution, which enables near FP16 accuracy/perplexity. We also co-design the LNS arithmetic and accelerator architecture, which leads to 33% less energy than FP8 (E4M3) accelerator with similar area as an INT8 accelerator, while delivering 30% lower perplexity compared to FP8 (E4M3).
LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge
Yujin Kim, Faraz Tahmasebi, Gunjae Koo, Hyoukjun Kwon — Korea University, UC Irvine
IEEE Micro Special Issue (2026)
FP data types provide higher model performance (e.g. lower perplexity, in context of LLMs) compared to integer data types. Here lower perplexity means the model is less “surprised” by the data and predicts it better. It is essentially tied to the model’s predictive uncertainty, often interpreted as the exponentiated average cross-entropy over the test set.
LNS to FP number conversion during accumulation of partial sums.
Optimal bit distribution for integer and fractional parts in LNS has not been studied thoroughly.
Proposed: LogFlex accelerator architecture which has 8-bit LNS Arithmetic hardware co-designed with hardware.
Follow up:
LNS for hardware accelerators: 5, 6
Quantization Strategies: 3, 7