The CIO of Atreides Management believes the AI race is shifting away from training models and toward how fast, cheaply, and reliably those models can run in real products.

In a new post on X, Gavin Baker says Nvidia’s $20 billion Groq deal is less about acquiring talent and more about locking up the economics of AI inference, the process of running a trained AI model to make predictions on new, unseen data.

Baker says inference is breaking into two distinct phases: prefill, where a model processes a prompt or context, and decode, where it generates tokens in real time for users.

Baker explains that Nvidia’s roadmap now covers both ends of that process. He says that the upcoming Rubin CPX chips are optimized for prefill, using large amounts of memory to handle massive context windows, while Groq’s SRAM architecture is designed for real-time reasoning. According to Baker, the design sacrifices capacity but delivers ultra-low latency, making it ideal for applications where delays break the user experience, such as voice assistants, live translation or agentic AI workflows.

“The Groq-derived ‘Rubin SRAM’ is optimized for ultra-low latency agentic reasoning inference workloads as a result of SRAM’s extremely high memory bandwidth at the cost of lower memory capacity. In the latter case, either CPX or the normal Rubin will likely be used for prefill.”

He adds that it has long been known that SRAM systems can produce far higher tokens-per-second than GPUs or most ASICs. But until recently, it was unclear whether customers would pay more per token for that speed.

“It is now abundantly clear from Cerebras and Groq’s recent results that users are willing to pay for speed.”

According to Baker, the market’s response dramatically strengthens Nvidia’s position. With multiple Rubin variants and tightly integrated networking, the investor says it has now become increasingly difficult for competing custom chips to justify their existence.

“Increases my confidence that all ASICs except TPU, AI5 and Trainium will eventually be canceled. Good luck competing with the three Rubin variants and multiple associated networking chips.”

Disclaimer: Opinions expressed at CapitalAI Daily are not investment advice. Investors should do their own due diligence before making any decisions involving securities, cryptocurrencies, or digital assets. Your transfers and trades are at your own risk, and any losses you may incur are your responsibility. CapitalAI Daily does not recommend the buying or selling of any assets, nor is CapitalAI Daily an investment advisor. See our Editorial Standards and Terms of Use.

Nvidia Primed To Control Next Phase of AI Inference After Groq Deal, According to Investor Gavin Baker

Anthropic Raises $30,000,000,000 As Run-Rate Revenue Grew 10x Annually Over Three Years

VC Says OpenAI Facing Consumer Headwinds As Gemini Grows to Over 50% of ChatGPT’s Monthly Active Users

Microsoft AI CEO Says White-Collar Automation Is Near As Study Shows AI Now Handling Five-Hour Expert Tasks

BlackRock Pours $1,596,182,622,945 Into Nvidia, Tesla, Apple, Amazon, Microsoft, Meta and Google, According to the SEC

Retail Investors Pour Record $176,000,000 Into Software ETF IGV As Tech Slumps 33%, According to Analyst

Elon Musk Says xAI’s ‘Macrohard’ Could Simulate Digital-Only Companies Like Microsoft