Much of the focus in semiconductors is on chip performance, so why “better” chips sometimes lose out to “weaker” chips can be a mystery to many people outside the process. As just one example, Intel still sells a ton of server CPUs despite underperforming against the latest AMD or Arm offerings.
Much of this comes down to the structure of the data center market, and it’s more complex than many imagine. This is important for any company looking to enter this market, whether using CPUs or the latest AI accelerators.
The first problem is that the data center chip market is highly concentrated in the top ten customers – the “Super Seven” – Amazon, Google, Facebook, Microsoft, Baidu, Alibaba and Tencent, and we will join Oracle, JD.com and Apple. These companies consume over 50% of industry server-grade CPUs and over 70-80% of other data center chips.
In addition to these customers, the shift of enterprise IT to the cloud has left behind a highly fragmented set of smaller customers—financial firms, research labs, some oil and gas companies, and some smaller Internet companies.
For large, established semiconductor companies, this is nearly insurmountable. These companies have to target the largest customers, anything below the top ten is too small to move. Many startups in this space want to start with smaller clients that can provide enough revenue to maintain operations and VC interest, but eventually they need to break into the big leagues.
Those big customers are well aware of their market position. Plus, they’re writing big checks. So they get their suppliers to do the qualification checks. This starts years before the chips are actually produced, as chip designers seek input from customers on chip specifications. How much memory and what type of memory will the customer be using? How many I/O channels are there? Etc. Next is a model that shows a simulation of a chip design, usually running on an FPGA board. Once the design is complete, it is sent to a fab for fabrication.
Then the real work begins.
Hyperscalers have a rigorous testing process and come with their own set of confusing acronyms. Usually, this requires some chips in the lab. Next came a few dozen — enough to build a usable server rack. All of this is just proof that the chip performed as promised during the design phase.
The next step is to build a full-fledged system—thousands of chips. At this stage, customers typically run their actual production software very closely to monitor performance. This step is especially painful for chip designers because they don’t have access to the customer’s software, so they can’t test performance in advance.
Around this time, the customer also built a complex total cost of ownership (TCO) model. These look at the overall performance and cost of the system, not just the chips, but other elements of the server — memory, power consumption, cooling needs, and so on.
A difficult reality of this market is that while the main processor is the most important part of any server, it typically only accounts for about 20% of that server’s cost. These models ultimately drive customers’ purchasing decisions.
While all this was going on, chip companies had to scramble. When a chip first comes back from the foundry, it may have errors that require adjustments to the manufacturing process for better yields. So in the early days, there were never enough chips to go around. Every customer wants to try them, forcing chip designers to prioritize and ration supplies. This step carries considerable risk when there are only a few customers – no customer feels they have the full support of the supplier.
Even as numbers increase, new problems arise. Customers don’t want to buy chips, they want to buy complete systems. Therefore, chip companies need the support of the ODM ecosystem.
These companies have to make their own set of designs for the board and the entire rack, and these also need to be evaluated. That’s a big part of Intel’s staying power — every ODM is willing to do these designs for them because they might do other (PC) business with Intel. Everyone else has to contend with the small ODM in the “B” designer team of the large ODM.
From the first pen to paper to the first large-scale purchase order, the whole process can take three to four years. Less painful than the automotive design cycle, but more challenging in many ways.
Earlier this week, we discussed the news that Ampere is selling a development kit version of its latest chip. While Ampere is still small relative to Intel, they’ve been doing this long enough to have some experience with all of the above steps.
These developer kits are a smart way to broaden your market. Ampere is small enough that small customers are still important to them. However, they are not large enough to provide full sales support to these customers. The developer kit broadens the sales pipeline for curious engineers by engaging them in the first two steps of the evaluation process.
None of this is easy, and these complexities all depend on the challenges of actually designing the chip.
Post time: Apr-21-2023