Energy-aware edge AI accelerator design for applications from CNNs to LLMs
Artificial intelligence algorithms have driven rapid evolution in CPU, GPU, NPU and memory architectures to support increasingly diverse workloads across cloud and edge platforms. While GPU and NPU clusters dominate large scale training and inference, edge applications require specialised, energy efficient and high performance AI chips capable of low latency on device processing. Modern AI models rely on large parameter counts and GEMM operations, resulting in substantial energy consumption that limits edge deployment. This motivates the development of highly energy efficient edge AI accelerators to extend battery life, reduce power consumption and enable real time operation.
However, existing accelerator paradigms face fundamental challenges. For example, spiking neural network accelerators suffer from fan in limitations and quantisation induced accuracy loss. Binary neural network accelerator designs must balance energy efficiency, performance and minimised data movement. Attention and large language model based accelerators are constrained by memory capacity and bandwidth, as well as the efficient execution of mixed precision and non-linear operations. These challenges highlight the need for new AI chip architectures that jointly optimise energy efficiency, performance, area and system level integration for edge AI.
The dissertation investigates four AI accelerator design directions, each validated through tapeout prototypes. Spiking Neural Network (SNN) accelerators are explored with in network computing. Binary Neural Network (BNN) accelerators are studied using latch XOR logic for local computation. Attention-based accelerators are proposed with reconfigurable domino logic for in memory multiplication and sign group aggregation, supporting dual FP8 formats E4M3 and E5M2. Finally, large language model (LLM) accelerators are examined to support BitNet with optimised execution of mixed-precision integer and BF16 computations as well as non-linear operations including Softmax, RoPE, and RMS normalisation.
Speaker’s profile
Dongrui Li is a PhD candidate in the ISTD pillar at SUTD under the A*STAR Graduate Scholarship (AGS). His research focuses on AI chip design, including transformer-based LLM processors, neuromorphic processors, CNN/BNN accelerators, and NPU/compute-in-memory architectures, with a strong emphasis on extreme software–hardware co-design. He has completed 4 tapeouts in 28 nm and 40 nm technologies. He received his BEng in IC design from Nanyang Technological University under the NTU Science and Engineering Scholarship.