NVIDIA Says Groq Acquisition Will Play a Role Similar to Mellanox, Extending the Architecture as an “Accelerator” For Low-Latency Decode

NVIDIA Says Groq Acquisition Will Play a Role Similar to Mellanox, Extending the Architecture as an “Accelerator” For Low-Latency Decode

NVIDIA's plans for Groq's LPU units are a topic of debate in the industry, and when Jensen was asked about them during the Q4 2026 earnings call, he hinted at rather interesting stuff.

NVIDIA's Groq LPUs Will Solidify the Company's Position In Latency-Sensitive Workloads

NVIDIA's acquisition spree has been aggressive this year. Still, one of the major partnerships that the company entered into was with Groq, a non-licensing agreement worth up to $20 billion, which is Team Green's biggest investment. The announcement did slip in on Christmas Eve, and NVIDIA never really followed up on actual plans. Interestingly, NVIDIA's CEO was asked about what his company would do with Groq during the earnings call, and he offered a context-setting statement that could hint at how LPUs will play a role in the future of NVIDIA's AI offerings.

With respect to how we think about Groq and the low latency decoder, I've got some great ideas that I'd like to share with you at GTC. And so what we'll do with Groq is you'll come to see GTC, but what we'll do is we'll extend our architecture with Groq as an accelerator in very much the way that we extended NVIDIA's architecture with Mellanox. - NVIDIA's CEO Jensen Huang

The idea behind the Groq acquisition is simple. NVIDIA wants to target latency-sensitive workloads, and in today's time, the inference stage has taken over. Applications bound to agentic environments require ultra-fast responses, which is why latency has started to become a major bottleneck for compute providers. For NVIDIA, the company has dominated training with Hopper and Blackwell, but with Vera Rubin, inference is an area where they have yet to solidify their lead, and Groq's LPU units will play a massive role in setting the bar.

NVIDIA's CEO has related the role of Groq to the 'Mellanox' acqusition, and for those unaware of what Mellanox did for the company, it solved the 'networking' problem. Mellanox provided the groundwork for InfiniBand, and this later led to what NVIDIA calls "extreme co-design", so it won't be wrong to say that the acquisition gave NVIDIA a boost with its datacenter strategy. Groq will play a similar role, and, as Jensen mentions, Team Green will "extend their architecture" with Groq, which implies we will see some form of rack-scale integration for LPUs.

Image Credits: NVIDIA

Decode and prefill are known to be the main stages of inference, and in the case of agentic AI, the former takes on much greater importance. Given that, in a multi-agent workload, decode allows agents to perform complex reasoning steps in mere 'seconds', which is necessary as the world moves towards swarms of AI agents that depend on each other. With Rubin CPX, NVIDIA has essentially covered up the prefill stages through its attention-acceleration engines and massive NVFP4 compute.

For decoding, NVIDIA will leverage Groq, as Jensen mentioned. LPUs leverage on-die SRAM to provide tens of terabytes per second of internal bandwidth, and we have already seen SRAM widely adopted by the likes of Cerebras with WSE-3 and Microsoft with Maia 300. As to where we could actually see LPUs getting integrated, there are two main theories. The first is that NVIDIA could design hybrid compute nodes within rack-scale offerings, with multiple LPUs connected via a unified interconnect.

GF Securities believes (via Jukan) NVIDIA could unveil an "LPX rack" at this year's GTC, featuring 256 LPU units in a single unit. Building on their information, we believe that for LPU-to-LPU connections, NVIDIA will leverage the native plesiosynchronous chip-to-chip protocol. For LPU-to-GPU, we could see NVLink Fusion included to handle the massive KV cache offload from the GPUs during the prefill phase. The other option being explored is having LPUs as on-die units within Feynman GPUs via hybrid bonding, but for now, the rack-scale option appears to be a much better possibility.

Preliminary rendering of NVIDIA's possible hybrid compute tray with LPUs | Image Credits: Wccftech

The idea is that Groq's LPU units will play a role similar to Mellanox's in networking, and that this hybrid architecture will give NVIDIA a head start on latency-sensitive workloads. At this earnings call, Jensen already disclosed that compute and revenue are now growing 1:1, driven by the 'application layer' of AI evolving more aggressively. We expect NVIDIA to formally unveil its plans for LPUs at this year's GTC.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Tap to scroll