Home » Product Launch » Quadric Expands 3rd Gen Chimera GPNPU Product Family, Adds Automotive-Grade Safety Enhanced Versions

Quadric Expands 3rd Gen Chimera GPNPU Product Family, Adds Automotive-Grade Safety Enhanced Versions

Quadric Logo

Quadric introduced the Chimera QC Series family of general-purpose neural processors (GPNPUs), a semiconductor intellectual property (IP) offering that blends the machine learning (ML) performance characteristics of a neural processing accelerator with the full C++ programmability of a modern digital signal processor (DSP). The third-generation implementation of the Chimera architecture, the QC family includes both single core and multicore cluster offerings as well as safety-enhanced versions of both.

 

Building upon the successful Chimera QB series GPNPUs introduced in late 2022, the evolutionary QC series adds more configurability to tailor the mix of performance characteristics to match the expected ML inference workload anticipated for a particular system on chip (SoC) design. The QC series includes three configurable single-core processor options: the Chimera QC Nano processor delivering up to 7 TOPs of ML horsepower, the Chimera QC Perform processor packing up to 28 TOPs of performance, and the Chimera QC Ultra processor that cranks out 108 TOPs.

 

For systems demanding even higher performance, the new QC-M family of multicore GPNPUs offers pre-integrated clusters of two, four or eight of the QC Nano, QC Perform or QC Ultra building block cores. The QC-M family thus scales from running small workloads in parallel (Nano cores) all the way up to high-compute applications (eight QC Ultra cores). This performance provides Level 4 central ADAS applications with 864 TOPs for crunching multiple large input format camera streams in parallel. QC-M clusters include inter-core communications circuitry as well as streaming weight sharing functions for broadcasting common machine learning model weights to two or more cores in a cluster.

 

“The remarkable compute density of the QC Series GPNPU cores is a significant breakthrough for the automotive market,” said Quadric co-founder and CEO Veerbhan Kheterpal. “A component supplier in the automotive market building a 3nm chiplet could deliver over 400 TOPs of fully C++ programmable ML + DSP compute for Software Defined Vehicle platforms for a die cost of well under $10,” continued Kheterpal. “Compare that price-performance to existing solutions that repurpose $10,000 datacenter GPGPUs or performance-limited mobile phone chipsets redirected into the automotive market.”

 

The QC series processors include a range of configuration options designed to allow the SoC developer to match the GPNPU capability to the target application. The Chimera architecture blends high-performance multiply-accumulate (MAC) units with fully C++ programmable 32-bit fixed point ALUs in each Processing Element (PE). An array of PEs is scaled from 64 to 1024 PEs to build the Nano, Performance and Ultra cores. Each configured GPNPU core can have a ratio of 8, 16 or 32 INT8 MACs for each PE. Designers targeting systems with large, weight-bound workloads such as Large Language Models (LLMs) will choose the 8 MAC configuration with wide AXI interfaces. Designers building systems operating on more MAC-intensive workloads such as high-resolution image processing will choose the higher ratio 32 MAC per ALU option. And a 16-bit floating point multiple-accumulate unit at half the throughput rate of the INT8 MACs is a configurable option for each processor.

 

The cycle-accurate Chimera Instruction Set Simulator that accompanies the Quadric Chimera GPNPU enables design teams to fully simulate target workloads to make smart choices about MAC ratios, AXI widths, tightly coupled Level 2 RAM size, and other user selectable hardware options. Compared to the previous generation Chimera processor offering, the new configuration options for Chimera QC cores can deliver up to 2.7X higher TOPS/mm2 compute density.

 

Many design teams today are wrestling with how to best implement the most energy efficient machine learning compute engine to run today’s – and tomorrow’s – generative AI models. LLMs in particular have massive sets of coefficients (weights) that must be streamed into the chosen compute engine for each token generated, making those models I/O limited in many instances. Quadric’s Chimera QC series adds an option to use 4-bit weights trained in the most advanced training tools, reducing data bandwidth requirements compared to standard 8-bit integer weights. Coupled with extra-wide AXI interconnect interfaces up to 1024-bits / cycle, the new QC Series cores directly address the needs of companies seeking to implement low-power, high-performance LLM models in volume consumer devices.

 

The QC processor series and the multicore QC-M processor family both are offered in Safety Enhanced Versions that combine hardware enhancements to ensure greater error resiliency. Each SE version core is coupled with FMEA analysis reports and collaborative DIA report generation all backed by the Chimera Software Development Kit toolchain that is undergoing ISO 26262 tool confidence level certification.

 

The QC series of the Chimera family of GPNPUs offers scalable performance ranging from 1 TOP to 864 TOPs. This series includes three individual cores and multicore clusters of two, four, or eight cores. The Chimera QC Nano provides machine learning capabilities from 1 trillion operations per second (TOPS) with 64 giga operations per second (GOPs) DSP capability in mature process nodes (16nm or 12nm) to up to 7 ML TOPs in advanced 3nm processes. The Chimera QC Perform blends mid-range performance with compact size, spanning from 4 TOPS to 28 TOPs, and 256 to over 400 GOPs of DSP horsepower. The Chimera QC Ultra delivers up to 108 TOPs in the most advanced 3nm nodes. Finally, the Chimera QC-M multicore solution combines two, four, or eight of any configuration of Nano, Perform, or Ultra cores, and can be configured to deliver an impressive 864 TOPs for the most demanding applications.

 

Chimera cores can be targeted to any silicon foundry and any process technology. The entire family of QB Series GPNPUs can achieve up to 1.7 GHz operation in 3nm processes using conventional standard cell flows and commonly available single-ported SRAM.

Announcements

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT

Share this post with your friends

Share on facebook
Share on google
Share on twitter
Share on linkedin

RELATED POSTS