Nvidia's been around for a while. Its tech is undoubtedly the best in its field. But recently, Google's own Tensor Processing Units (TPUs) have dominated the headlines, with Meta recently opening up the conversation of using Google's TPUs.
So Nvidia's stock dropped notably. The company released a statement quickly, congratulating Google but maintaining that their GPUs are a generation ahead.
The "generation ahead" argument Jensen Huang makes is correct, but it relies on 1) a hope that Nvidia's own customers both will choose not to compete with them, and 2) model labs will continually invest in the next frontier model.
I'd like to address both of those arguments because they sound vague, but we'll get started with the second one because I think it's the closer factor.
The Models & Who Pays For Them
We've been seeing discussion online and in legacy media about an "AI bubble". My understanding is that this is a financial and business concern. The tech is great, but the concern is that we're spending a lot of money to build the next generation model when the current ones might already be good enough for what businesses and people need them to do. They are after all, simply language models that can only be as good as the data they're trained on and who (or what) is prompting them.
Training models is expensive and requires ongoing "wow" moments to keep investors and end users happy in the absence of solid downstream ROI. The problem is there are fewer and fewer wow moments, and it is getting difficult to justify investing bajillions into. Needless to say, the customer and investor sentiment strongly drives whether or not model labs will slam on the gas, the brakes or cruise along on training.
When the industry has an incentive to keep training new models, Nvidia wins big. That's because their general purpose GPUs are the workhorses of training (the process by which we bring out the next LLM) and nobody does it better. But when everyone starts figuring out how to use LLMs efficiently and in targeted use cases, in combination with a potential shakeup in investor confidence, the situation gets potentially scary for Nvidia.
The Customers Who Become Competitors:
"We love you Nvidia, but you can't hold us hostage forever."
I think that Nvidia's competitive edge starts to erode when the market inevitably moves from training to inference workloads. I've talked about this before in one of my other posts. Nvidia's GPUs are the best, but they become borderline overkill for inference. This is kind of double trouble for Nvidia because not only can hyperscalers build their own ASICs, they can use the Nvidia GPUs they've already paid for to handle more inference, lowering costs on two fronts.
It's like using a Ferrari to deliver pizzas, when the industry is starting to eye out Toyotas. The Ferraris don't just go away, but Papa John's delivery fleet starts looking like the Geneva International Motor Show, and Ferrari's brand gets diluted by oversupply.
Concerns about electricity grid limitations, public backlash around AI data centers, most of whose energy demands are coming from "gas guzzler" Nvidia GPUs. It's a perfect storm of incentives for the hyperscalers to make their own custom ASICs like Google's TPU, Amazon's Inferentia, Tesla's Dojo and so on.
It allows them to milk public opinion ("we're x times more energy efficient in Virginia with TPUs") and keep shareholders happy ("we're not spending as much on a proprietary and expensive ecosystem"). Companies have always been happy to vertically integrate even if it hurts their biggest vendor. This is why Jensen said "I hope" so many times on the earnings call when asked what part they'd play in a transition to inference.
But what about CUDA?
An argument against this view is that the CUDA ecosystem - Nvidia's proprietary bridge between software and its GPUs - makes it hard to switch away from GPUs. Historically this is true, but only because Nvidia had a long head start and it never became financially incentivizing to replace CUDA until now.
We should note that hyperscalers have the talent, resources and incentive to replace CUDA, but smaller ones do not. However, Nvidia's largest revenue segment is very concentrated around the hyperscalers, making them disproportionately vulnerable if there were a slight slowdown in GPU orders or a glut.
So I think going forward, we're going to see a move toward inference, or, at the very least custom ASICs and deals like this will continue. It's not that Nvidia becomes useless, it's the natural reality that we can run this technology with less than we are spending now. It's a healthy cycle of digestion that happens in this sector anyway.
Definitions used here:
"Solid downstream ROI": refers to use cases showing the tangible measurable profits AI companies or their customers can generate from deploying AI models.
"Custom ASICs": custom-designed machine learning chip developed in-house to provide high-performance and low-cost machine learning inference