<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LPX on Korea Invest Insights</title><link>https://koreainvestinsights.com/tags/lpx/</link><description>Recent content in LPX on Korea Invest Insights</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 28 May 2026 12:36:02 +0900</lastBuildDate><atom:link href="https://koreainvestinsights.com/tags/lpx/feed.xml" rel="self" type="application/rss+xml"/><item><title>NVIDIA's Post-GTC 2026 Inference Stack: Why LPX and CMX Moved Ahead of CPX</title><link>https://koreainvestinsights.com/post/nvidia-vera-rubin-lpx-cmx-inference-stack-samsung-hbm-2026-05-28/</link><pubDate>Thu, 28 May 2026 15:30:00 +0900</pubDate><guid>https://koreainvestinsights.com/post/nvidia-vera-rubin-lpx-cmx-inference-stack-samsung-hbm-2026-05-28/</guid><description>
 &lt;blockquote&gt;
 &lt;p&gt;📚 NVIDIA·Vera Rubin follow-up series
&lt;a class="link" href="https://koreainvestinsights.com/post/nvidia-q1-fy27-korea-ai-infra-supply-chain-2026-05-21/" &gt;NVIDIA Q1 FY27 and Korea AI Infrastructure&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/post/vera-rubin-vr200-bom-memory-pcb-mlcc-korea-alpha-2026-05-21/" &gt;Vera Rubin VR200 BOM Cost Check&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/post/ai-ran-nvidia-earnings-skt-vs-supply-chain-2026-05-17/" &gt;AI-RAN and the Korea Supply Chain&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/post/marvell-q1-fy2027-korea-semiconductor-readthrough-2026-05-28/" &gt;Marvell Q1 FY2027 and Korean Semiconductors&lt;/a&gt;&lt;/p&gt;

 &lt;/blockquote&gt;

 &lt;blockquote&gt;
 &lt;p&gt;📚 Samsung Electronics · Korea semiconductor linked reads
&lt;a class="link" href="https://koreainvestinsights.com/post/samsung-electronics-tsmc-rerating-thesis-2026-05-16/" &gt;Samsung Electronics PER 15x Re-rating Thesis&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/post/samsung-foundry-customer-list-tesla-tenstorrent-2026-05-03/" &gt;Samsung Foundry Customer List&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/page/korea-semiconductor-hbm-kospi-hub/" &gt;AI HBM Hub&lt;/a&gt; / &lt;a class="link" href="https://koreainvestinsights.com/page/korea-semiconductor-equipment-ip-hub/" &gt;Semiconductor Value Chain Hub&lt;/a&gt;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Post-GTC 2026 Inference Stack: Why LPX and CMX moved ahead of CPX" class="gallery-image" data-flex-basis="320px" data-flex-grow="133" height="1086" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://koreainvestinsights.com/post/nvidia-vera-rubin-lpx-cmx-inference-stack-samsung-hbm-2026-05-28/post-gtc-cpx-lpx.png" srcset="https://koreainvestinsights.com/post/nvidia-vera-rubin-lpx-cmx-inference-stack-samsung-hbm-2026-05-28/post-gtc-cpx-lpx_hu_274c3df633f60644.png 800w, https://koreainvestinsights.com/post/nvidia-vera-rubin-lpx-cmx-inference-stack-samsung-hbm-2026-05-28/post-gtc-cpx-lpx.png 1448w" width="1448"&gt;&lt;/p&gt;
&lt;h2 id="beginner-tldr"&gt;Beginner TL;DR
&lt;/h2&gt;&lt;p&gt;CPX is the accelerator that &lt;strong&gt;reads a long prompt for the first time&lt;/strong&gt;. LPX is the accelerator that &lt;strong&gt;rapidly pulls out one token at a time&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Post-GTC 2026, NVIDIA&amp;rsquo;s message has shifted: the bigger bottleneck is not the one-time prefill read, but the continuously repeated decode and KV cache management. That is why the strategic focus moved from CPX as a standalone chip to an inference system built around &lt;strong&gt;Rubin GPU + LPX + CMX + Dynamo&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt; LPX and CMX moved to the front because the monetizable bottleneck in AI inference has shifted from &amp;ldquo;reading long context once&amp;rdquo; to &lt;strong&gt;&amp;ldquo;generating every token quickly and cheaply reusing that memory.&amp;quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From a Korean semiconductor perspective, this should not be read as &amp;ldquo;weakening HBM demand.&amp;rdquo; It should be interpreted as an inference infrastructure value chain that spans &lt;strong&gt;HBM + high-speed networking + storage tiers + substrate/packaging + power/cooling&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="0-the-simplest-analogy-ai-inference-is-read-then-speak"&gt;0. The Simplest Analogy: AI Inference Is &amp;ldquo;Read, Then Speak&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;When you send a question to an AI, two stages happen internally.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Stage&lt;/th&gt;
 &lt;th&gt;Plain-Language Explanation&lt;/th&gt;
 &lt;th&gt;Technical Term&lt;/th&gt;
 &lt;th&gt;Bottleneck&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Stage 1&lt;/td&gt;
 &lt;td&gt;The AI reads the question and supporting material first&lt;/td&gt;
 &lt;td&gt;Prefill / Context phase&lt;/td&gt;
 &lt;td&gt;Ability to read long documents quickly&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Stage 2&lt;/td&gt;
 &lt;td&gt;The AI speaks the answer one word at a time&lt;/td&gt;
 &lt;td&gt;Decode / Generation phase&lt;/td&gt;
 &lt;td&gt;Ability to generate one token at a time, quickly and stably&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Memory&lt;/td&gt;
 &lt;td&gt;Notes taken while reading&lt;/td&gt;
 &lt;td&gt;KV cache&lt;/td&gt;
 &lt;td&gt;Where those notes are stored and how fast they can be retrieved&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;CPX was primarily aimed at &lt;strong&gt;Stage 1: reading long context for the first time&lt;/strong&gt;. The Rubin CPX NVIDIA unveiled in 2025 was a GPU targeting 1M+ token context workloads — very long context processing — with 30 PFLOPS NVFP4 compute, 128 GB GDDR7, and 3× attention acceleration. (&lt;a class="link" href="https://developer.nvidia.com/blog/nvidia-rubin-cpx-accelerates-inference-performance-and-efficiency-for-1m-token-context-workloads/" title="NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M&amp;#43; Token Context Workloads | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;LPX and CMX, by contrast, target &lt;strong&gt;Stage 2: answer generation&lt;/strong&gt; and &lt;strong&gt;Memory: KV cache management&lt;/strong&gt;, respectively. Per NVIDIA&amp;rsquo;s LPX documentation, the Rubin GPU handles prefill and decode attention, while LPX accelerates latency-sensitive operations such as FFN/MoE inside the decode loop. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;) CMX creates a dedicated KV cache tier between GPU HBM and conventional storage. (&lt;a class="link" href="https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/" title="Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;h2 id="01-glossary"&gt;0.1 Glossary
&lt;/h2&gt;&lt;h3 id="gtc-2026"&gt;GTC 2026
&lt;/h3&gt;&lt;p&gt;GTC is NVIDIA&amp;rsquo;s most important developer and customer event for announcing its AI infrastructure strategy. The central message of GTC 2026 was an expansion from &amp;ldquo;a GPU company for AI training&amp;rdquo; to &lt;strong&gt;&amp;ldquo;a company selling the full AI factory operating system and hardware stack.&amp;quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At GTC 2026, NVIDIA unveiled Dynamo 1.0 and described it as the distributed operating system for the AI factory — software that orchestrates GPUs, memory, and storage at the cluster level. (&lt;a class="link" href="https://nvidianews.nvidia.com/news/dynamo-1-0" title="NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories | NVIDIA Newsroom"
 target="_blank" rel="noopener"
 &gt;NVIDIA Newsroom&lt;/a&gt;)&lt;/p&gt;
&lt;h3 id="inference"&gt;Inference
&lt;/h3&gt;&lt;p&gt;Inference is the process by which an AI actually generates a response. AI infrastructure used to revolve around &amp;ldquo;GPUs for training a model.&amp;rdquo; But as real-world usage of products like ChatGPT, Claude, coding agents, and search agents has exploded, &lt;strong&gt;inference cost and latency now matter more than training.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This shift is significant for investors. Training is a CAPEX driven by a small number of large AI labs. Inference is an OPEX that accrues daily with every user request. That means cost-per-token, response latency, and power efficiency flow directly into service margins.&lt;/p&gt;
&lt;h3 id="token"&gt;Token
&lt;/h3&gt;&lt;p&gt;A token is the smallest unit an AI reads or writes. In English it may be a word or word fragment; in Korean it may be a character, a word, or part of a word. When users feed in long reports, codebases, or video transcripts, token counts surge.&lt;/p&gt;
&lt;p&gt;Revenue and cost in AI services ultimately converge on one question: &amp;ldquo;How many tokens can be processed, how cheaply, and how fast?&amp;rdquo;&lt;/p&gt;
&lt;h3 id="prefill--context-phase"&gt;Prefill / Context Phase
&lt;/h3&gt;&lt;p&gt;Prefill is the stage where the AI reads the question and any attached material for the first time. For example, if you submit a 200-page report and ask for a summary, the model first reads the entire document to build its internal state. That process is prefill.&lt;/p&gt;
&lt;p&gt;CPX was originally aimed at this domain. NVIDIA&amp;rsquo;s own CPX description draws the distinction: the context phase is compute-bound, while the generation phase is memory bandwidth-bound. (&lt;a class="link" href="https://developer.nvidia.com/blog/nvidia-rubin-cpx-accelerates-inference-performance-and-efficiency-for-1m-token-context-workloads/" title="NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M&amp;#43; Token Context Workloads | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;CPX = a dedicated reading-assist device that speeds up the initial read of long material.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="decode--generation-phase"&gt;Decode / Generation Phase
&lt;/h3&gt;&lt;p&gt;Decode is the stage where the AI generates its answer one token at a time. The AI does not produce a full response all at once. Even a short reply is constructed internally as a sequence of small pieces.&lt;/p&gt;
&lt;p&gt;The key point is that &lt;strong&gt;decode is repetitive work.&lt;/strong&gt; Long answers, coding agents, reasoning models, and tool-use agents continuously generate thousands to tens of thousands of tokens. Users experience this stage&amp;rsquo;s speed directly.&lt;/p&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s LPX documentation states that in interactive inference, time-to-first-token, tokens/sec per user, and tail latency are the core metrics. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;LPX = an accelerator that helps the AI speak quickly and without interruption, one word at a time.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="kv-cache"&gt;KV Cache
&lt;/h3&gt;&lt;p&gt;KV cache is the intermediate memory where the AI stores what it has already read, so it does not need to recompute it.&lt;/p&gt;
&lt;p&gt;Imagine a 30-minute conversation with an AI. If the model had to re-read and recompute the entire prior conversation from scratch with every new response, costs would explode. Instead, the model stores what it has already processed as a KV cache.&lt;/p&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s CMX documentation calls KV cache &amp;ldquo;inference context&amp;rdquo; and notes that in agentic systems it is reused like the model&amp;rsquo;s long-term memory. (&lt;a class="link" href="https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/" title="Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;KV cache = the working memo where the AI remembers &amp;ldquo;what we talked about earlier.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="hbm"&gt;HBM
&lt;/h3&gt;&lt;p&gt;HBM is the ultra-high-speed memory bonded directly to the GPU. It is the most critical memory in AI training and inference — expensive, supply-constrained, and difficult to package.&lt;/p&gt;
&lt;p&gt;There is an important misconception to dispel. &lt;strong&gt;LPX and CMX should not be seen as replacing HBM.&lt;/strong&gt; LPX is an SRAM-based low-latency decode assist device; CMX is a storage tier for KV cache. HBM remains the core memory of the Rubin GPU.&lt;/p&gt;
&lt;p&gt;The precise framing is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;LPX and CMX do not eliminate HBM demand. They offload certain bottlenecks that were crowding HBM, allowing HBM to focus on higher-value computation.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="sram"&gt;SRAM
&lt;/h3&gt;&lt;p&gt;SRAM is very fast but small-capacity memory. It can be accessed far more quickly than HBM on a GPU, but is difficult to produce in large volumes.&lt;/p&gt;
&lt;p&gt;SRAM is the heart of LPX. Per NVIDIA&amp;rsquo;s official LPX materials, the Groq 3 LPX rack offers 256 chips, 128 GB total SRAM, 40 PB/s on-chip SRAM bandwidth, and 640 TB/s scale-up bandwidth. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;A quick sanity check:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Item&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Formula&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Value&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;LPX chip count&lt;/td&gt;
 &lt;td style="text-align: right"&gt;32 trays × 8 chips&lt;/td&gt;
 &lt;td style="text-align: right"&gt;256 chips&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;SRAM capacity&lt;/td&gt;
 &lt;td style="text-align: right"&gt;32 trays × 4 GB&lt;/td&gt;
 &lt;td style="text-align: right"&gt;128 GB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In other words, LPX is not a large-capacity memory device. It is closer to &lt;strong&gt;a device that uses small, very fast SRAM to reduce decode latency.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="cpx"&gt;CPX
&lt;/h3&gt;&lt;p&gt;CPX is best understood as Context Processing X — a GPU specialized for processing long context. Its original rationale was clear: when an AI first reads a long codebase, long video, long report, or long research document, prefill cost grows. CPX was designed to accelerate that &amp;ldquo;first read.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;However, post-GTC 2026 public messaging put LPX and CMX more prominently forward than CPX. Tom&amp;rsquo;s Hardware noted that the Rubin CPX was absent from GTC 2026 keynote slides while Groq 3 LPU/LPX racks appeared, interpreting this as a signal that NVIDIA is more focused on the LPU side than on CPX — though it stopped short of concluding CPX was fully cancelled, noting it could remain as an off-roadmap product for certain customers. (&lt;a class="link" href="https://www.tomshardware.com/pc-components/gpus/nvidia-removes-rubin-cpx-accelerators-from-its-roadmap-groq-3-lpus-take-center-stage-as-cpx-is-removed" title="Nvidia removes Rubin CPX accelerators from its roadmap — Groq 3 LPUs take center stage as CPX is removed | Tom&amp;#39;s Hardware"
 target="_blank" rel="noopener"
 &gt;Tom&amp;rsquo;s Hardware&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In summary:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;CPX = an accelerator specialized for reading long material the first time.
Its priority in current messaging appears to have declined, though a full cancellation is difficult to confirm.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="lpx--lpu"&gt;LPX / LPU
&lt;/h3&gt;&lt;p&gt;LPX is a low-latency inference rack that attaches the Groq LPU to NVIDIA&amp;rsquo;s Vera Rubin platform. LPU stands for Language Processing Unit — a processor specialized for language model inference.&lt;/p&gt;
&lt;p&gt;LPX has three core strengths:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Strength&lt;/th&gt;
 &lt;th&gt;Plain-Language Explanation&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Low latency&lt;/td&gt;
 &lt;td&gt;Responses arrive quickly without stutter&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Predictable execution&lt;/td&gt;
 &lt;td&gt;Variance in per-user response latency is reduced&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;SRAM-based high-speed processing&lt;/td&gt;
 &lt;td&gt;Small but extremely fast memory handles decode operations&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;NVIDIA describes LPX as designed to work in tandem with the Rubin GPU: the Rubin GPU handles prefill and decode attention, while LPX takes on latency-sensitive FFN/MoE operations during decode. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Rubin GPU = the heavy-duty general-purpose engine.
LPX = a high-speed auxiliary engine that boosts token generation throughput.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="cmx"&gt;CMX
&lt;/h3&gt;&lt;p&gt;CMX is a dedicated memory/storage tier for KV cache.&lt;/p&gt;
&lt;p&gt;In the agentic AI era, conversations, code, tool calls, search results, and task histories grow long — and KV cache grows with them. Keeping all KV cache in GPU HBM is too expensive. Offloading it to conventional SSDs or object storage is too slow. CMX sits in between.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Location&lt;/th&gt;
 &lt;th&gt;Plain-Language Analogy&lt;/th&gt;
 &lt;th&gt;Characteristics&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;GPU HBM&lt;/td&gt;
 &lt;td&gt;Notes on your desk&lt;/td&gt;
 &lt;td&gt;Fastest but expensive&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CMX&lt;/td&gt;
 &lt;td&gt;The cabinet right beside you&lt;/td&gt;
 &lt;td&gt;Quite fast and much larger&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Conventional storage&lt;/td&gt;
 &lt;td&gt;The warehouse&lt;/td&gt;
 &lt;td&gt;Large but slow&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;NVIDIA describes CMX as a G3.5 tier between GPU HBM and conventional storage, enabling KV cache to be shared and reused across pods to reduce the bottleneck for long-context and agentic inference. (&lt;a class="link" href="https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/" title="Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;CMX = a dedicated cache warehouse that stores the AI&amp;rsquo;s conversation memory cheaply and quickly.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="dynamo"&gt;Dynamo
&lt;/h3&gt;&lt;p&gt;Dynamo is the traffic control system for NVIDIA&amp;rsquo;s inference infrastructure. No matter how good the GPU, LPX, CMX, KV cache, and storage are, efficiency collapses without knowing where to route a request, which cache to reuse, and which GPU already holds the relevant memory.&lt;/p&gt;
&lt;p&gt;Dynamo orchestrates all of that. NVIDIA states that Dynamo 1.0 splits inference work between GPUs and lower-cost storage, and can route requests to the GPU that already holds the relevant KV cache for agentic AI and long prompts. (&lt;a class="link" href="https://nvidianews.nvidia.com/news/dynamo-1-0" title="NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories | NVIDIA Newsroom"
 target="_blank" rel="noopener"
 &gt;NVIDIA Newsroom&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Dynamo = the AI factory operating system that decides &amp;ldquo;send this request to that GPU — it already has the relevant memory.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="02-why-lpx-and-cmx-rather-than-cpx"&gt;0.2 Why LPX and CMX Rather Than CPX
&lt;/h2&gt;&lt;h3 id="cpx-is-good-at-reading-once-lpx-is-good-at-talking-continuously"&gt;CPX Is Good at &amp;ldquo;Reading Once&amp;rdquo;; LPX Is Good at &amp;ldquo;Talking Continuously&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;CPX targets prefill. It is useful when reading a long document for the first time.&lt;/p&gt;
&lt;p&gt;But the bottleneck users actually feel in live AI services is often in decode. If a response comes back slowly, if a coding agent takes a long time to move to the next file edit, or if a voice assistant cuts out, users notice immediately.&lt;/p&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s LPX documentation itself states that workloads are shifting more toward decode due to longer reasoning outputs, prefix caching, and longer context. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The logic for putting LPX ahead of CPX is therefore simple:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;The speed of generating every token repeatedly is more directly tied to service quality and billing than the speed of reading something once.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="in-agentic-ai-kv-cache-becomes-a-core-asset"&gt;In Agentic AI, KV Cache Becomes a Core Asset
&lt;/h3&gt;&lt;p&gt;A conventional chatbot answers one question and stops. An agent is different. A coding agent, for example, cycles through reading code, generating a patch, running tests, reading errors, revising, retesting, and reporting results.&lt;/p&gt;
&lt;p&gt;Throughout that process, it must continuously remember prior context. KV cache is no longer a temporary scratchpad — it becomes &lt;strong&gt;the agent&amp;rsquo;s working memory.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s CMX documentation notes that as long-context and agentic workflows grow, KV cache capacity requirements grow proportionally, and the ability to reuse and store that cache is essential to both performance and efficiency. (&lt;a class="link" href="https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/" title="Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;That is why CMX matters:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;The bottleneck in the agentic AI era is not just &amp;ldquo;computation&amp;rdquo; — it is &amp;ldquo;where you store the memory and how fast you can retrieve it.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="lpx-directly-improves-user-perceived-performance"&gt;LPX Directly Improves User-Perceived Performance
&lt;/h3&gt;&lt;p&gt;In AI services, average throughput is not the only thing that matters. Users care about whether their answer arrives right now.&lt;/p&gt;
&lt;p&gt;LPX reduces latency and jitter through SRAM, compiler-orchestrated execution, and deterministic execution. NVIDIA notes that the LPU&amp;rsquo;s deterministic execution helps maintain stable time-to-first-token and per-token latency. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In plain language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;A GPU-only setup is like a large restaurant with high total cooking capacity but variable per-table wait times. LPX is a dedicated express lane for VIP orders.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="cmx-keeps-expensive-gpus-from-sitting-idle"&gt;CMX Keeps Expensive GPUs from Sitting Idle
&lt;/h3&gt;&lt;p&gt;GPUs are the most expensive component. If a GPU is waiting for data, money is leaking.&lt;/p&gt;
&lt;p&gt;CMX keeps KV cache near the GPU and prefetches it, reducing GPU idle time and redundant recomputation. NVIDIA states that CMX targets up to 5× improvement in tokens-per-second and 5× power efficiency compared to traditional storage. (&lt;a class="link" href="https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/" title="Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Translated into investor language:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;CMX is infrastructure that raises GPU CAPEX utilization and lowers cost-per-token.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="for-nvidia-the-full-system-is-a-stronger-lock-in-than-a-single-chip"&gt;For NVIDIA, &amp;ldquo;The Full System&amp;rdquo; Is a Stronger Lock-In Than &amp;ldquo;A Single Chip&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;CPX is essentially an individual accelerator. LPX, CMX, and Dynamo together are how NVIDIA controls the entire AI factory.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Layer&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Rubin GPU&lt;/td&gt;
 &lt;td&gt;Large-scale compute, prefill, attention, training/inference general purpose&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LPX / LPU&lt;/td&gt;
 &lt;td&gt;Low-latency decode acceleration&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CMX&lt;/td&gt;
 &lt;td&gt;Dedicated KV cache context memory tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Spectrum-X / NVLink&lt;/td&gt;
 &lt;td&gt;Data movement&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BlueField-4 DPU&lt;/td&gt;
 &lt;td&gt;Storage/network I/O offload&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Dynamo&lt;/td&gt;
 &lt;td&gt;Request routing, KV cache movement, GPU/memory orchestration&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This structure is powerful because customers are not simply buying a GPU. They are &lt;strong&gt;optimizing their entire inference service operation the NVIDIA way.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="03-cpx-lpx-and-cmx-in-one-sentence-each"&gt;0.3 CPX, LPX, and CMX in One Sentence Each
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Term&lt;/th&gt;
 &lt;th&gt;One-Sentence Description&lt;/th&gt;
 &lt;th&gt;Analogy&lt;/th&gt;
 &lt;th&gt;Core Bottleneck&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;CPX&lt;/td&gt;
 &lt;td&gt;Accelerator for reading long context the first time&lt;/td&gt;
 &lt;td&gt;A speed-reading device for thick books&lt;/td&gt;
 &lt;td&gt;Prefill&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LPX&lt;/td&gt;
 &lt;td&gt;Accelerator for generating one token at a time, quickly&lt;/td&gt;
 &lt;td&gt;A speech engine that delivers words fast and consistently&lt;/td&gt;
 &lt;td&gt;Decode latency&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CMX&lt;/td&gt;
 &lt;td&gt;KV cache tier for storing and reusing the AI&amp;rsquo;s prior memory&lt;/td&gt;
 &lt;td&gt;A dedicated cabinet for conversation notes&lt;/td&gt;
 &lt;td&gt;KV cache capacity / movement&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Dynamo&lt;/td&gt;
 &lt;td&gt;Operating system that schedules and routes all inference work&lt;/td&gt;
 &lt;td&gt;Air traffic control tower&lt;/td&gt;
 &lt;td&gt;GPU utilization / routing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Rubin GPU&lt;/td&gt;
 &lt;td&gt;Large-scale general-purpose AI engine&lt;/td&gt;
 &lt;td&gt;Main engine&lt;/td&gt;
 &lt;td&gt;Training, prefill, attention, general inference&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="04-correcting-the-most-important-misconceptions"&gt;0.4 Correcting the Most Important Misconceptions
&lt;/h2&gt;&lt;h3 id="misconception-1-lpx-replaces-hbm"&gt;Misconception 1: &amp;ldquo;LPX Replaces HBM&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;Inaccurate. LPX does not replace HBM. It &lt;strong&gt;complements the decode latency domain where HBM-based GPUs are less efficient.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;HBM remains the core memory of the Rubin GPU. LPX handles small, fast tasks using SRAM. CMX partially extends KV cache storage beyond HBM.&lt;/p&gt;
&lt;p&gt;The accurate framing is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;LPX and CMX do not eliminate HBM demand. They concentrate HBM on higher-value computation and distribute surrounding bottlenecks elsewhere.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="misconception-2-cpx-has-become-irrelevant"&gt;Misconception 2: &amp;ldquo;CPX Has Become Irrelevant&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;This is uncertain. The unresolved question is &lt;strong&gt;whether CPX was fully cancelled or merely deprioritized on the public roadmap.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The verification paths are NVIDIA&amp;rsquo;s latest official roadmap, GTC 2026 keynote slides, the next earnings call, or customer system announcements. Based on current public reporting, CPX was absent from GTC 2026 keynote slides and roadmap presentations while LPX was prominent — but the possibility of CPX remaining as an off-roadmap product for specific customers has not been ruled out. (&lt;a class="link" href="https://www.tomshardware.com/pc-components/gpus/nvidia-removes-rubin-cpx-accelerators-from-its-roadmap-groq-3-lpus-take-center-stage-as-cpx-is-removed" title="Nvidia removes Rubin CPX accelerators from its roadmap — Groq 3 LPUs take center stage as CPX is removed | Tom&amp;#39;s Hardware"
 target="_blank" rel="noopener"
 &gt;Tom&amp;rsquo;s Hardware&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The conservative framing is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Rather than &amp;ldquo;CPX fully cancelled,&amp;rdquo; the safer statement is &amp;ldquo;the strategic front-line message has moved to LPX and CMX.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="misconception-3-a-better-gpu-is-all-you-need-for-inference"&gt;Misconception 3: &amp;ldquo;A Better GPU Is All You Need for Inference&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;Not anymore. Inference bottlenecks are a combined function of GPU compute, HBM, SRAM, KV cache, storage, networking, and routing software.&lt;/p&gt;
&lt;p&gt;That is precisely why NVIDIA&amp;rsquo;s emphasis post-GTC 2026 is not the standalone GPU but the &lt;strong&gt;AI factory stack.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="05-key-takeaway-for-non-technical-readers"&gt;0.5 Key Takeaway for Non-Technical Readers
&lt;/h2&gt;&lt;p&gt;The old AI infrastructure race was &amp;ldquo;who has the most powerful GPU?&amp;rdquo; Training was the focus: building large models required massive GPUs and HBM.&lt;/p&gt;
&lt;p&gt;Now the battlefield is shifting. As AI moves into live services, billions to trillions of tokens must be processed every day. Users leave if responses are slow. Companies see margins erode if cost-per-token is too high. AI agents must continuously remember long conversations and task histories.&lt;/p&gt;
&lt;p&gt;In this environment, a single fast GPU is no longer sufficient.&lt;/p&gt;
&lt;p&gt;First, the AI must read long material for the first time — a task for CPX or the Rubin GPU. Second, the AI must generate answers quickly, one token at a time — the domain of LPX. Third, the AI must store prior conversation and task state without recomputing it — the domain of CMX. Fourth, someone must decide which GPU receives which request and which KV cache gets reused — the domain of Dynamo.&lt;/p&gt;
&lt;p&gt;The essence of NVIDIA&amp;rsquo;s post-GTC 2026 strategy is therefore:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;A shift from &amp;ldquo;sell more GPUs&amp;rdquo; to &amp;ldquo;design and operate the entire AI inference factory.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;In this light, CPX may be a fine product, but it is too narrow to anchor a strategy. CPX excels at a specific interval — processing long context the first time. LPX and CMX, by contrast, target the repetitive, expensive bottlenecks of real AI services: decode latency and KV cache reuse.&lt;/p&gt;
&lt;p&gt;For investors, this shift matters. The beneficiaries of AI inference infrastructure can no longer be explained by HBM alone. HBM remains critical, but the analysis must simultaneously cover high-speed networking, DPUs, SSDs/storage, CXL/memory tiers, substrates, packaging, power, and cooling. The AI inference stack is transitioning from a &lt;strong&gt;&amp;ldquo;GPU-centric single bottleneck&amp;rdquo;&lt;/strong&gt; to a &lt;strong&gt;&amp;ldquo;composite bottleneck across memory, networking, storage, and software.&amp;quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The conclusion of this piece is simple. The directional thesis is correct. But the language must be precise. Post-GTC 2026, NVIDIA&amp;rsquo;s inference strategy is not about selling a bigger Vera Rubin GPU — it has moved to a heterogeneous AI factory combining Vera Rubin GPU/CPU, Groq 3 LPX/LPU, BlueField-4 STX·CMX, Spectrum-X/SPX, and Dynamo. CPX was originally a GPU targeting long-context prefill/context phase, but GTC 2026&amp;rsquo;s official front-line message brought forward the LPX·STX·CMX combination. However, there is still insufficient public language to conclude that NVIDIA has officially cancelled CPX.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="key-summary"&gt;Key Summary
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;[Fact]&lt;/strong&gt; At GTC 2026, NVIDIA presented the Vera Rubin platform as a rack-scale AI factory comprising a Vera Rubin NVL72 GPU rack, Vera CPU rack, Groq 3 LPX inference accelerator rack, BlueField-4 STX storage rack, and Spectrum-6 SPX Ethernet rack. (&lt;a class="link" href="https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform" title="NVIDIA Vera Rubin Opens Agentic AI Frontier | NVIDIA Newsroom"
 target="_blank" rel="noopener"
 &gt;NVIDIA Newsroom&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Fact]&lt;/strong&gt; CPX was introduced in 2025 as a GPU targeting 1M+ token long-context processing and context phase acceleration, with 128 GB GDDR7, 30 PFLOPS NVFP4, and attention acceleration as its core messages. (&lt;a class="link" href="https://developer.nvidia.com/blog/nvidia-rubin-cpx-accelerates-inference-performance-and-efficiency-for-1m-token-context-workloads/" title="NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M&amp;#43; Token Context Workloads | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Fact]&lt;/strong&gt; LPX is a different product with a different character. NVIDIA states that LPX handles latency-sensitive FFN/MoE execution and speculative decoding draft generation inside the decode loop, while the Rubin GPU continues to handle prefill, decode attention, and verification. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Inference]&lt;/strong&gt; LPX is therefore not HBM-bearish. Rubin GPU/HBM handles large memory and attention; LPU/SRAM complements the low-latency decode path. CMX/STX creates the KV cache storage tier on top of that.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Korea read-through&lt;/strong&gt; is widest for Samsung Electronics. HBM4·SOCAMM2, Groq LPU foundry, and PCIe Gen6 eSSD/KV-cache are bundled within a single company. SK하이닉스 has a cleaner HBM beta, but the incremental alpha in this piece is not &amp;ldquo;HBM alone&amp;rdquo; — it is the full memory hierarchy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="1-verdict-on-each-hypothesis"&gt;1. Verdict on Each Hypothesis
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Claim&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Verdict&lt;/th&gt;
 &lt;th&gt;Comment&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Post-GTC 2026, NVIDIA&amp;rsquo;s front-line strategy shifted to a combination of Vera Rubin GPU + Groq 3 LPX/LPU + storage/networking&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Largely correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;The official Vera Rubin platform announcement bundled GPU, CPU, LPX, STX, and SPX into one platform. Closer to pod/rack orchestration than &amp;ldquo;GPU-only scaling.&amp;rdquo;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CPX was a product targeting the prefill/context bottleneck&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;NVIDIA&amp;rsquo;s CPX documentation explicitly targeted long-context context phase and 1M+ token workloads.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;The monetizable bottleneck is decode rather than prefill&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Conditionally correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;For coding agents, voice assistants, and multi-turn agentic workflows, decode latency and tail latency are more closely tied to billing and perceived quality. However, for long-document ingestion, full codebase analysis, and batch summarization, prefill/context remains a high-value bottleneck.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;NVIDIA may have prioritized LPX over CPX&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Strong inference&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;LPX·STX·SPX moved to the front in GTC 2026 messaging and CPX stepped back from the main stage. However, &amp;ldquo;official CPX cancellation&amp;rdquo; remains [Blocked].&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CPX&amp;rsquo;s role was absorbed into the Vera Rubin + LPX + CMX/STX combination&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Partially correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Parts of CPX&amp;rsquo;s context role are distributed across the Rubin GPU and the CMX/STX KV cache tier, while the low-latency decode that CPX could not address directly is now handled by LPX. It is not a 1:1 replacement.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Groq LPX/LPU complements the decode weakness of HBM GPUs rather than replacing HBM&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;The Rubin GPU is an HBM-based large-memory/attention engine; the LPU is an SRAM-based low-latency token engine.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Samsung Electronics entered as a Groq LPU manufacturing partner&lt;/td&gt;
 &lt;td style="text-align: right"&gt;&lt;strong&gt;Largely correct&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Samsung Semiconductor stated that Jensen Huang mentioned Samsung&amp;rsquo;s Groq LPU manufacturing role at GTC 2026. Specific LP30 volumes, margins, and yields have not been disclosed. (&lt;a class="link" href="https://semiconductor.samsung.com/news-events/tech-blog/architecting-the-ai-era-samsung-electronics-and-nvidia-define-the-future-at-gtc-2026/" title="Architecting the AI Era: Samsung Electronics and NVIDIA Define the Future at GTC 2026 | Samsung Semiconductor"
 target="_blank" rel="noopener"
 &gt;Samsung Semiconductor&lt;/a&gt;)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="2-vera-rubin-is-not-a-single-gpu--it-is-a-pod-scale-ai-factory"&gt;2. Vera Rubin Is Not a Single GPU — It Is a POD-Scale AI Factory
&lt;/h2&gt;&lt;p&gt;NVIDIA&amp;rsquo;s official message is not &amp;ldquo;the Vera Rubin GPU is fast.&amp;rdquo; The more important development is that NVIDIA decomposed the AI factory into five rack-scale systems and reassembled them.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;System&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;th&gt;Investment Implication&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Vera Rubin NVL72 GPU rack&lt;/td&gt;
 &lt;td&gt;Pretraining, post-training, prefill, decode attention, verification&lt;/td&gt;
 &lt;td&gt;HBM4 and GPU compute remain the mainstream.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Vera CPU rack&lt;/td&gt;
 &lt;td&gt;CPU orchestration for agentic AI workloads, coherent memory, host-side scheduling&lt;/td&gt;
 &lt;td&gt;CPUs are revalued as an orchestration layer within the AI rack.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Groq 3 LPX inference accelerator rack&lt;/td&gt;
 &lt;td&gt;Low-latency decode FFN/MoE, draft generation, deterministic token path&lt;/td&gt;
 &lt;td&gt;An attempt to price the tail latency of premium interactive inference.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BlueField-4 STX / CMX storage rack&lt;/td&gt;
 &lt;td&gt;KV cache storage, context memory tier, cache reuse&lt;/td&gt;
 &lt;td&gt;A structure to move KV cache costs — which had been crowding GPU HBM — down to pod-level storage.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Spectrum-6 SPX / Spectrum-X fabric&lt;/td&gt;
 &lt;td&gt;Deterministic fabric among GPU, LPU, storage, and DPU&lt;/td&gt;
 &lt;td&gt;Rack utilization and data movement become the bottleneck, not just the chip.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s moat in this structure is not GPU FLOPS alone. NVIDIA is attempting to capture the full token economics — prefill cost, decode latency, KV cache reuse, networking jitter, watts/token, rack utilization — in a single integrated offering. This shift should be read not as &amp;ldquo;CPX dropped out&amp;rdquo; but as &amp;ldquo;NVIDIA decomposed inference into finer-grained components and created a monetization point at each layer.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="3-role-decomposition-cpx-lpx-and-cmx"&gt;3. Role Decomposition: CPX, LPX, and CMX
&lt;/h2&gt;&lt;p&gt;Treating CPX and LPX as the same category of &amp;ldquo;inference chip&amp;rdquo; creates confusion. They target different bottlenecks.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Dimension&lt;/th&gt;
 &lt;th&gt;CPX&lt;/th&gt;
 &lt;th&gt;LPX/LPU&lt;/th&gt;
 &lt;th&gt;CMX/STX&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Basic character&lt;/td&gt;
 &lt;td&gt;GDDR7-based context GPU&lt;/td&gt;
 &lt;td&gt;SRAM-based low-latency decode accelerator&lt;/td&gt;
 &lt;td&gt;BlueField-4-based context memory / KV cache storage tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Core bottleneck&lt;/td&gt;
 &lt;td&gt;Long-context prefill, context phase, attention-heavy input processing&lt;/td&gt;
 &lt;td&gt;FFN/MoE and pointwise operations inside the decode loop, speculative decoding draft generation&lt;/td&gt;
 &lt;td&gt;KV cache storage, movement, and reuse as multi-turn and long-context workloads grow&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Key resources&lt;/td&gt;
 &lt;td&gt;128 GB GDDR7, 30 PFLOPS NVFP4&lt;/td&gt;
 &lt;td&gt;256 LPUs, 128 GB SRAM, 40 PB/s SRAM bandwidth (per LPX rack)&lt;/td&gt;
 &lt;td&gt;Flash/storage + DPU + Spectrum-X + DOCA/Dynamo&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Relationship to GPU/HBM&lt;/td&gt;
 &lt;td&gt;Context-dedicated GPU alongside the Rubin GPU&lt;/td&gt;
 &lt;td&gt;SRAM decode tier complementing Rubin GPU/HBM&lt;/td&gt;
 &lt;td&gt;External context memory tier complementing GPU HBM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Investment interpretation&lt;/td&gt;
 &lt;td&gt;The solution to &amp;ldquo;context is too long&amp;rdquo;&lt;/td&gt;
 &lt;td&gt;The solution to &amp;ldquo;interactive token latency is where the money is&amp;rdquo;&lt;/td&gt;
 &lt;td&gt;The solution to &amp;ldquo;KV cache is eating into HBM&amp;rdquo;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;NVIDIA&amp;rsquo;s LPX technical blog draws the division fairly clearly. The Rubin GPU handles long-context prefill, decode attention, and high-concurrency inference. LPX handles latency-sensitive token generation, FFN/MoE expert execution, and the draft path of speculative decoding. LPX is therefore not a chip that kills HBM; it is an auxiliary engine that accepts the small-batch, low-latency decode path that HBM-based GPUs handle less efficiently. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/p&gt;
&lt;h2 id="4-strengths-and-limits-of-decode-is-the-monetizable-bottleneck"&gt;4. Strengths and Limits of &amp;ldquo;Decode Is the Monetizable Bottleneck&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;This statement is more than half right. Decode is close to the monetization bottleneck in particular for the following workloads:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agentic coding assistants&lt;/li&gt;
&lt;li&gt;Multi-agent workflows&lt;/li&gt;
&lt;li&gt;Voice interaction&lt;/li&gt;
&lt;li&gt;Real-time translation&lt;/li&gt;
&lt;li&gt;Enterprise copilots with high tool-calling loop frequency&lt;/li&gt;
&lt;li&gt;Premium AI services demanding long reasoning outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Users do not directly feel prefill throughput. But time-to-first-token, tokens/sec/user, and tail latency are immediately felt. And in agentic workflows, a single-call delay compounds across dozens of model calls. This is why LPX moved to the front.&lt;/p&gt;
&lt;p&gt;However, it is an overstatement to say decode is always more valuable than prefill. In long-context RAG, full codebase analysis, large-document processing, video understanding, and batch summarization, the cost of ingesting the input context and building the KV cache remains significant. CPX existed because that bottleneck was real. The more precise post-GTC 2026 framing is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;For high-value interactive and agentic inference, decode latency has emerged as the monetization bottleneck, and NVIDIA is addressing it with LPX/LPU. However, in long-context AI, prefill and KV cache movement remain core bottlenecks — addressed by the Rubin GPU and the CMX/STX storage/networking tier.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="5-investment-read-through"&gt;5. Investment Read-Through
&lt;/h2&gt;&lt;h3 id="nvidia-from-gpu-company-to-token-factory-os"&gt;NVIDIA: From GPU Company to Token Factory OS
&lt;/h3&gt;&lt;p&gt;NVIDIA&amp;rsquo;s long-run logic is less about &amp;ldquo;a faster GPU&amp;rdquo; and more about &amp;ldquo;more inference attachment.&amp;rdquo; When an LPX, BlueField-4, Spectrum-6, CMX, and Dynamo attach to every Vera Rubin rack alongside the GPU, NVIDIA collects a toll at more layers of the AI factory.&lt;/p&gt;
&lt;p&gt;The bull case is clear. What customers need is not a chip benchmark — it is token throughput, low tail latency, watts/token, and utilization. NVIDIA is positioning itself as the company that sells all four as a single rack/POD bundle.&lt;/p&gt;
&lt;p&gt;The counter-argument also exists. Google TPU, hyperscaler custom ASICs, and specialist inference chips such as Cerebras are all trying to reduce the NVIDIA tax. And if LPX and CMX fail to demonstrate the TCO improvement claimed by the vendor in production workloads, the attachment narrative weakens. NVDA&amp;rsquo;s next checkpoint is therefore not just GPU revenue — it is the LPX·CMX·Spectrum attach rate per Rubin rack.&lt;/p&gt;
&lt;h3 id="samsung-electronics-from-memory-cycle-play-to-inference-memory-hierarchy-supplier"&gt;Samsung Electronics: From Memory Cycle Play to Inference Memory Hierarchy Supplier
&lt;/h3&gt;&lt;p&gt;Samsung Electronics has the widest Korea-listed exposure to this shift, for four reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;HBM4/HBM4E&lt;/strong&gt;: The large-memory tier of the Vera Rubin GPU.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SOCAMM2&lt;/strong&gt;: Vera CPU and AI server memory architecture.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Groq LPU foundry&lt;/strong&gt;: AI logic manufacturing option for the SRAM decode tier inside LPX.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PCIe Gen6 eSSD / KV cache storage&lt;/strong&gt;: The context memory tier that CMX/STX opens.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Samsung Semiconductor disclosed that at GTC 2026 it showcased HBM4E, HBM5 architecture, SOCAMM2, and PM1763 PCIe 6.0 SSD, and that Jensen Huang mentioned Samsung&amp;rsquo;s Groq LPU manufacturing role. (&lt;a class="link" href="https://semiconductor.samsung.com/news-events/tech-blog/architecting-the-ai-era-samsung-electronics-and-nvidia-define-the-future-at-gtc-2026/" title="Architecting the AI Era: Samsung Electronics and NVIDIA Define the Future at GTC 2026 | Samsung Semiconductor"
 target="_blank" rel="noopener"
 &gt;Samsung Semiconductor&lt;/a&gt;) Samsung Electronics&amp;rsquo; 1Q26 earnings materials also referenced HBM4 and SOCAMM2 mass product sales for the Vera Rubin platform and PCIe Gen6 SSD development. (&lt;a class="link" href="https://news.samsung.com/global/samsung-electronics-announces-first-quarter-2026-results" title="Samsung Electronics Announces First Quarter 2026 Results | Samsung Global Newsroom"
 target="_blank" rel="noopener"
 &gt;Samsung Newsroom&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The Samsung Electronics thesis is therefore too narrow if framed only as &amp;ldquo;HBM laggard catching up.&amp;rdquo; The broader statement is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Samsung Electronics may be reclassified as an inference memory hierarchy supplier with simultaneous exposure across HBM4, SOCAMM2, LPU foundry, and KV cache SSD within the NVIDIA inference stack.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;This thesis demands evidence, however. LPU yield and margin, HBM4E customer acceptance, SOCAMM2 shipment volume, and actual KV cache attachment for PCIe Gen6 eSSD all need to be confirmed.&lt;/p&gt;
&lt;h3 id="sk하이닉스--micron-hbm-winners-but-incremental-alpha-is-narrow-here"&gt;SK하이닉스 · Micron: HBM Winners, but Incremental Alpha Is Narrow Here
&lt;/h3&gt;&lt;p&gt;LPX is not HBM-bearish. It keeps Rubin GPU/HBM at the center of the premium inference stack while delegating only low-latency decode to SRAM LPUs. The HBM thesis for SK하이닉스 and Micron is therefore not impaired.&lt;/p&gt;
&lt;p&gt;The new alpha generated in this piece, however, is not &amp;ldquo;HBM is good.&amp;rdquo; That is already consensus. The new alpha is the addition of an SRAM decode tier, a KV cache storage tier, and a rack networking tier above and below HBM. SK하이닉스 has the cleaner pure HBM beta, but Samsung Electronics has the wider architectural read-through.&lt;/p&gt;
&lt;h3 id="삼성전기-not-the-primary-subject-but-power-integrity-read-through-holds"&gt;삼성전기: Not the Primary Subject, but Power Integrity Read-Through Holds
&lt;/h3&gt;&lt;p&gt;The LPX/CMX architecture is not just a story about adding more GPUs. As the number of chip types inside a rack increases and low-latency paths run in parallel with high-bandwidth memory paths, the importance of power integrity, high-speed substrates, SiCap/MLCC, and FC-BGA is maintained.&lt;/p&gt;
&lt;p&gt;삼성전기 is not the protagonist of this piece. But just as the Marvell earnings confirmed the custom XPU·optical·scale-up networking thesis, NVIDIA&amp;rsquo;s heterogeneous AI factory signals that the &amp;ldquo;small bottlenecks next to the GPU&amp;rdquo; in the Korean components supply chain may continue to command premium pricing.&lt;/p&gt;
&lt;h2 id="6-checklist"&gt;6. Checklist
&lt;/h2&gt;&lt;p&gt;Maintaining this thesis requires verifying the following items in order:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Checkpoint&lt;/th&gt;
 &lt;th&gt;Significance&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Groq 3 LPX H2 2026 availability&lt;/td&gt;
 &lt;td&gt;Confirms whether LPX moves from slide to actual deployment&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Samsung LPU mass production&lt;/td&gt;
 &lt;td&gt;Whether Samsung Foundry secures an AI inference logic reference win&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HBM4E sample/customer acceptance&lt;/td&gt;
 &lt;td&gt;Samsung Memory penetration rate into the next Vera Rubin platform&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;SOCAMM2 shipment continuity&lt;/td&gt;
 &lt;td&gt;Whether the CPU/agentic AI memory architecture converts to revenue&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;PCIe Gen6 eSSD and CMX/STX adoption&lt;/td&gt;
 &lt;td&gt;Whether the KV cache storage tier translates to real sales&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CPX follow-on roadmap&lt;/td&gt;
 &lt;td&gt;Whether CPX is on hold, a niche product, or set to reappear&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Dynamo/AFD production benchmark&lt;/td&gt;
 &lt;td&gt;Whether heterogeneous decode actually lowers TCO&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="final-judgment"&gt;Final Judgment
&lt;/h2&gt;&lt;p&gt;The user&amp;rsquo;s thesis is directionally correct. Post-GTC 2026, NVIDIA is decomposing inference from GPU-only scaling into &lt;strong&gt;HBM GPU + SRAM LPU + KV cache storage + high-speed networking + orchestration software.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The safest language is:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Post-GTC 2026, NVIDIA&amp;rsquo;s inference strategy has expanded from homogeneous scaling centered on the Vera Rubin GPU to a heterogeneous AI factory combining Vera Rubin NVL72 + Groq 3 LPX/LPU + BlueField-4 STX/CMX + Spectrum-X/SPX. The existing Rubin CPX was a GDDR7-based context GPU targeting the long-context prefill/context bottleneck, but GTC 2026&amp;rsquo;s official platform messaging placed the LPX and KV cache storage/networking tier at the front. However, whether CPX has been officially cancelled cannot be concluded from public materials alone.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The more important investment statement is this:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;LPX is not an HBM replacement. The HBM GPU continues to handle large models, large context, attention, and verification. LPX complements the small-batch, low-latency decode path where the GPU is less efficient. This change is therefore not HBM-bearish — it is a signal that the AI inference memory hierarchy has become more complex.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="evidence-classification-appendix"&gt;Evidence Classification Appendix
&lt;/h2&gt;&lt;h3 id="fact"&gt;[Fact]
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The NVIDIA Vera Rubin platform comprises Vera Rubin NVL72, Vera CPU, Groq 3 LPX, BlueField-4 STX, and Spectrum-6 SPX racks. (&lt;a class="link" href="https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform" title="NVIDIA Vera Rubin Opens Agentic AI Frontier | NVIDIA Newsroom"
 target="_blank" rel="noopener"
 &gt;NVIDIA Newsroom&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;CPX was announced in 2025 as a GPU for 1M+ token context workloads. (&lt;a class="link" href="https://developer.nvidia.com/blog/nvidia-rubin-cpx-accelerates-inference-performance-and-efficiency-for-1m-token-context-workloads/" title="NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M&amp;#43; Token Context Workloads | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;LPX was presented as a rack-scale inference accelerator with 256 LPUs, 128 GB SRAM, 40 PB/s on-chip SRAM bandwidth, and 640 TB/s scale-up bandwidth. (&lt;a class="link" href="https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/" title="Inside NVIDIA Groq 3 LPX | NVIDIA Technical Blog"
 target="_blank" rel="noopener"
 &gt;NVIDIA Developer&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The Groq-NVIDIA deal is a non-exclusive inference technology licensing agreement; Groq continues to operate independently. (&lt;a class="link" href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale" title="Groq and NVIDIA Enter Non-Exclusive Inference Technology Licensing Agreement | Groq"
 target="_blank" rel="noopener"
 &gt;Groq&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Samsung showcased HBM4E, SOCAMM2, and PM1763 PCIe 6.0 SSD at GTC 2026 and was mentioned as a Groq LPU manufacturing partner. (&lt;a class="link" href="https://semiconductor.samsung.com/news-events/tech-blog/architecting-the-ai-era-samsung-electronics-and-nvidia-define-the-future-at-gtc-2026/" title="Architecting the AI Era: Samsung Electronics and NVIDIA Define the Future at GTC 2026 | Samsung Semiconductor"
 target="_blank" rel="noopener"
 &gt;Samsung Semiconductor&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="inference-1"&gt;[Inference]
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The interpretation that LPX·CMX/STX·SPX were prioritized over CPX in GTC 2026 front-line messaging is well-supported.&lt;/li&gt;
&lt;li&gt;LPX is more likely to complement the utilization and premium inference economics of Rubin GPU/HBM than to replace HBM demand.&lt;/li&gt;
&lt;li&gt;Samsung Electronics&amp;rsquo; investment thesis lies in the combination of HBM4 + SOCAMM2 + LPU foundry + eSSD/KV cache rather than in HBM4 alone.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="speculation"&gt;[Speculation]
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The claim that CPX has been fully cancelled or formally absorbed into LPX+CMX has not been confirmed.&lt;/li&gt;
&lt;li&gt;Groq LP30/LPU-specific volumes, ASP, wafer allocation, and gross margins cannot be verified from public materials alone.&lt;/li&gt;
&lt;li&gt;Whether LPX and CMX will reproduce NVIDIA&amp;rsquo;s vendor-claimed perf/W, revenue uplift, and TPS improvements in production workloads remains unverified.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="blocked"&gt;[Blocked]
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Official cancellation status of CPX.&lt;/li&gt;
&lt;li&gt;Groq 3 LPX rack ASP and per-customer order quantities.&lt;/li&gt;
&lt;li&gt;Samsung Foundry LPU yield, wafer price, and margin contribution.&lt;/li&gt;
&lt;li&gt;Per-customer KV cache storage attachment for CMX/STX and actual TCO improvement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This piece should be used for research and commentary purposes only and does not constitute investment advice. Product roadmaps, yields, customer adoption, pricing, and revenue recognition are subject to change even after public disclosures and company announcements.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclaimer: For research and information purposes only. Not investment advice. Names cited are for analytical illustration; readers should perform their own due diligence and consult licensed advisors before any investment decision.&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>