An FP8 format compute-in-memory macro with domino logic for energy-delay efficient matrix multiplications

Technology title

An FP8 format compute-in-memory macro with domino logic for energy-delay efficient matrix multiplications

Technology overview

This invention provides an efficient compute architecture for general AI models by enabling high-throughput vector and matrix operations using two FP8 formats (1-4-3 and 1-5-2) while retaining results in FP16 for improved accuracy. It addresses the growing challenge of supporting low-precision arithmetic without compromising numerical quality, a key limitation in current AI accelerators as models become larger and more power-intensive. The technology is valuable to companies developing AI chips, cloud and data-centre operators seeking higher compute efficiency, and edge-AI device manufacturers requiring low-power accurate inference capabilities. By offering native dual-format FP8 computation with reliable FP16 accumulation, this IP meets an urgent market need for faster, more energy-efficient, and scalable hardware tailored for next-generation AI workloads.

Technology specifications

The technology is tailored for modern AI applications, particularly transformer-based and CNN-based models. These models rely heavily on matrix-vector and matrix-matrix computations, which demand significant power and resources. To address this, we propose an in-memory computing paradigm that minimises the need to read weights from SRAM and performs arithmetic operations under low power while maintaining high speed, thanks to custom dynamic logic. By avoiding 2’s complement computation throughout the process, we achieve significant savings in circuit design area and power consumption. Finally, the results are maintained in FP16 format to ensure high accuracy.

Sector

This technology falls within the field of integrated circuit design.

Market opportunity

The market opportunity for this technology is driven by the rapid adoption of low-precision computing in both training and inference for large AI models. As FP8 becomes a preferred format for reducing power, memory bandwidth, and cost, there is strong demand for hardware that can deliver high efficiency without compromising accuracy.

Data centres, cloud AI providers, and semiconductor companies are actively seeking advanced compute architectures that support multi-format FP8 execution with reliable FP16 accumulation. This creates a significant opportunity to supply a scalable and energy-efficient solution that aligns with the industry’s transition toward faster, more cost-effective, and more sustainable AI computation.

Applications

Key applications include using an edge AI chip platform to run personal language models.

Customer benefits

Potential competitors include NVIDIA’s FP8 transformer engines, Intel Habana’s mixed-precision compute units, and AMD’s low-precision matrix accelerators. Although they support FP8, these platforms typically handle only one format, rely on extensive software tuning to preserve accuracy, and often experience lower energy efficiency when operating on large and diverse AI workloads. In contrast, this technology delivers native support for both major FP8 formats with FP16 accumulation, offering stronger numerical reliability, higher energy efficiency, and broader applicability across vector and matrix computations, making it a more capable solution for next-generation AI systems.

Technology readiness level

TRL 6

Ideal collaboration partner

An ideal collaboration partner would be a company or research group developing next-generation AI accelerators, cloud inference hardware, or edge-AI SoCs, with a clear need for high-efficiency low-precision compute.

Strong candidates include semiconductor firms building custom AI chips, data-center operators optimising large-scale model deployment, and system integrators seeking flexible FP8 support with high numerical accuracy.

Partners with expertise in compiler stacks, model quantisation, and hardware–software co-design would also be well aligned, as they can help fully leverage the technology’s multi-format FP8 capabilities and energy-efficient compute architecture.

Collaboration mode

This technology is suitable for multiple collaboration modes, including R&D collaboration to co-develop and refine the architecture with industry or academic partners, licensing for companies seeking to integrate the design into their AI hardware products, IP acquisition for organizations aiming to incorporate the technology into their long-term accelerator roadmap, and test-bedding to validate performance, energy efficiency, and deployment readiness on real AI workloads.