JEDEC—the organization responsible for setting industry-standard memory specifications—is nearing completion of SPHBM4. This new memory standard aims to provide full HBM4-level bandwidth while achieving higher capacity and reducing integration costs through compatibility with traditional organic substrates, all while employing a "narrower" 512-bit interface. If successfully implemented, this technology will fill many gaps in the HBM-served market; however, as we will explain below, it is unlikely to be the terminator of GDDR memory.
While high-bandwidth memory (HBM) with 1024-bit or 2048-bit interfaces is unmatched in performance and energy efficiency, such wide interfaces consume a significant amount of valuable silicon area within high-end processors. This limits the number of HBM stacks that can be integrated on a single chip, thus constraining the memory capacity supported by AI accelerators, which in turn affects the performance of individual accelerators and the overall capability of large-scale clusters built upon them.
HBM in "standard package"
Standard Package High Bandwidth Memory (SPHBM4) was developed to address this issue. It reduces the memory interface width of HBM4 from 2048 bits to 512 bits while maintaining the same total bandwidth through 4:1 serialization. JEDEC did not explicitly state whether the "4:1 serialization" refers to quadrupling the data transfer rate of HBM4 from 8 GT/s or introducing a new encoding scheme with a higher clock speed; however, its goal is clear: to retain HBM4-level aggregate bandwidth within a 512-bit interface.
Internal to the package, SPHBM4 will use an industry-standard base die (likely manufactured by a foundry using logic processes, so the cost may not necessarily be lower; routing the "wide interface" DRAM chips onto the "narrow interface" base die becomes quite challenging in terms of density, and the slower DRAM-side interconnects and faster base die-side interconnects also present clocking challenges).
Meanwhile, it will continue to use standard HBM4 DRAM chips, which simplifies controller development at the logic level and ensures that the capacity of a single stack remains consistent with HBM4 and HBM4E, with HBM4E offering up to 64GB per stack.
Theoretically, this means that under the same chip conditions, SPHBM4 can achieve four times the memory capacity of HBM4. However, in practical designs, AI chip developers often have to balance memory capacity, higher computing power, and chip functionality versatility—as process nodes continue to evolve, the cost of silicon area continues to rise.
Will it become the "killer" of GDDR7?
Experienced readers might ask: given this, why not use SPHBM4 in gaming GPUs or graphics cards? This could potentially achieve higher bandwidth than GDDR7 (or the potential GDDR7X with PAM4 encoding) with a relatively modest cost increase.
The problem is that while SPHBM4 offers HBM4-level bandwidth, its design fundamentally prioritizes performance and capacity over power consumption and cost.
Although SPHBM4 is cheaper than HBM4 or HBM4E, it still requires stacked HBM DRAM chips—which are physically larger and therefore more expensive than commodity DRAM ICs; it also requires interface substrate chips, TSV processes, know-good-die processes, and advanced in-package assembly. These components constitute the main cost and are far less scalable than GDDR7—which benefits from a large consumer and gaming market, simpler packaging, and mature PCB assembly processes.
In other words, replacing multiple GDDR7 chips with an advanced SPHBM4 may not necessarily reduce costs; on the contrary, it may increase costs.
The real key lies in the implementation details
While a 512-bit memory bus remains a complex interface, JEDEC states that SPHBM4 supports 2.5D integration on traditional organic substrates without the need for expensive silicon interposers. This will significantly reduce integration costs and is expected to expand design flexibility. Furthermore, with its industry-standard 512-bit interface, SPHBM4 promises lower costs compared to C-HBM4E solutions that rely on UCIe or proprietary interfaces (thanks to economies of scale from standardization).
Compared to silicon interposer-based solutions, organic substrate wiring allows for longer electrical paths between the SoC and the memory stack. This could alleviate layout constraints in oversized packages and accommodate more memory capacity near the package than currently possible. However, wiring a 3084-bit memory interface (plus data and power lines) using a traditional substrate remains difficult to imagine, and how this will be achieved in the future remains to be seen.
Source: tomshardware