Site icon Banyan Hill Publishing

Chart of the Week: How AI Becomes Universal

How will lower inference costs help make AI universal?

This week’s chart isn’t just an illustration of technical progress.

It reveals the economics of the future of artificial intelligence.

Because for AI to become something everyone uses, it won’t just matter how good the models are or how much money companies are raising to build them.

It will come down to the cost of running them.

Until now, AI has mostly relied on large cloud providers and centralized compute. That makes sense when inference — the act of using an AI model — is expensive. Because every query to a large language model carries a real cost, and that cost shapes everything from how products are designed to how they’re priced.

But today’s chart shows that something very different is on the horizon.

Inference to Zero?

As you can see from this chart, inference costs aren’t just declining…

Source: Epoch AI

They’re collapsing.

According to Epoch AI’s estimates, a single consumer-grade GPU priced around $2,500 can now run models that match the performance of frontier systems from roughly six to 12 months earlier.

To be clear, we’re talking about the kind of hardware anyone can buy on a desktop or laptop.

If frontier-level AI can run on consumer hardware within a year, and open models follow within a few months of that, then inference stops being scarce for most applications.

And once inference stops being scarce, software changes.

Products won’t need to be designed around token budgets anymore. AI features won’t have to be limited to certain users. And intelligence will become something that software runs locally, not something it has to ask permission for from a distant server.

You could see early signs of this shift at CES this month.

Jensen Huang spent far less time talking about cloud workloads than he did about systems that operate continuously in the physical world, like robots, autonomous machines and even factories. Those systems can’t wait on remote servers or pay for every decision they make. They need intelligence running locally, all the time.

Lenovo showed the same idea applied to personal computing. The company’s focus is distributing intelligence across devices so AI can work continuously without relying on constant cloud access.

Lenovo’s new Qira platform isn’t just another chatbot. It’s designed to act as a cross-device “ambient intelligence” layer, learning user behavior and acting without constant user input.

That kind of always-available AI only works once inference is cheap enough to run continuously on the device itself.

And it doesn’t work at all if inference stays expensive.

Fortunately, today’s chart tells us that inference is getting cheaper faster than most people realize.

Yet many valuations and tech strategies still assume AI will stay in the cloud and that every use will remain metered and expensive.

That assumption favors the companies that own the biggest data centers.

And it might remain true for a small number of massive systems, like large-scale search or enterprise analytics. But for most applications, the ability to run powerful models locally — on your own hardware and just months after release — will radically democratize access to powerful AI.

It means companies can use AI without paying cloud fees and developers can work with private data without sending it to a third party.

Like the early internet, this lowers the barrier to entry. It will give smaller teams the chance to compete by building AI directly into their products instead of renting it from someone else.

Today’s chart captures this evolution.

Here’s My Take

Once inference costs fall close to zero, AI will become a built-in part of software the same way memory and storage became standard in computing decades ago.

In the early days of computing, every byte of memory and every second of processing was expensive. As those costs fell, first with personal computers and later with the cloud, entirely new kinds of software became possible.

The same thing is happening with AI today.

As inference gets cheaper, powerful AI will move out of data centers and move into everyday products. And developers will no longer need special access or massive budgets to use it. They’ll just create software with intelligence built in.

Of course, this challenges a long-standing assumption about how AI makes money.

When it no longer costs much to run intelligence, it doesn’t make sense to charge people every time they use it. Which means the value shifts away from selling access to AI to building better software with it.

Today’s chart shows that we could reach that turning point soon.

And I couldn’t be more excited about it. Because that’s how AI becomes truly universal.

Regards,


Ian King
Chief Strategist, Banyan Hill Publishing

Editor’s Note: We’d love to hear from you!

If you want to share your thoughts or suggestions about the Daily Disruptor, or if there are any specific topics you’d like us to cover, just send an email to dailydisruptor@banyanhill.com.

Don’t worry, we won’t reveal your full name in the event we publish a response. So feel free to comment away!

Exit mobile version