What are we scaling?
I’m confused why some people have short timelines and at the same time are bullish on the current scale up of reinforcement learning atop LLMs. If we’re actually close to a human-like learner, this whole approach of training on verifiable outcomes is doomed.
Currently the labs are trying to bake in a bunch of skills into these models through “mid-training” - there’s an entire supply chain of companies building RL environments which teach the model how to navigate a web browser or use Excel to write financial models.
Either these models will soon learn on the job in a self directed way - making all this pre-baking pointless - or they won’t - which means AGI is not imminent. Humans don’t have to go through a special training phase where they need to rehearse every single piece of software they might ever need to use.
Beren Millidge made interesting points about this in a recent blog post:
When we see frontier models improving at various benchmarks we should think not just of increased scale and clever ML research ideas but billions of dollars spent paying PhDs, MDs, and other experts to write questions and provide example answers and reasoning targeting these precise capabilities.
You can see this tension most vividly in robotics. In some fundamental sense, robotics is an algorithms problem, not a hardware or data problem — with very little training, humans can learn how to teleoperate current hardware to do useful work. So if we had a human like learner, robotics would (in large part) be solved. But the fact that we don’t have such a learner makes it necessary to go out into a thousand different homes to learn how to pick up dishes or fold laundry.
One counterargument I’ve heard from the takeoff-within-5-years crew is that we have to do this cludgy RL in service of building a superhuman AI researcher, and then the million copies of automated Ilya can go figure out how to solve robust and efficient learning from experience.
This gives the vibes of that old joke, “We’re losing money on every sale, but we’ll make it up in volume.” Somehow this automated researcher is going to figure out the algorithm for AGI - a problem humans have been banging their head against for the better part of a century - while not having the basic learning capabilities that children have? I find this super implausible.
Besides, even if you think the RLVR scaleup will soon help us automate AI research, the labs’ actions suggest otherwise. You don’t need to pre-bake the consultant’s skills at crafting Powerpoint slides in order to automate Ilya. So clearly the labs’ actions hint at a world view where these models will continue to fare poorly at generalizing and on-the-job learning, thus making it necessary to build in the skills that they hope will be economically valuable.
Another counterargument you could make is that even if the model could learn these skills on the job, it is just so much more efficient to build them up just once during training rather that again and again for each user or company. And look, it makes a lot of sense to just bake in fluency with common tools like browsers and terminals. Indeed one of the key advantages that AGIs will have is this greater capacity to share knowledge across copies. But people are underrating how much company and context specific skills are required to do most jobs. And there just isn’t currently a robust efficient way for AIs to pick up those skills.
Human labor is valuable precisely because it’s not shleppy to train
I was at a dinner with an AI researcher and a biologist. The biologist said she had long timelines. We asked what she thought AI would struggle with. She said her work has recently involved looking at slides and decide if a dot is actually a macrophage or just looks like one. The AI researcher says, “Image classification is a textbook deep learning problem—we could easily train for that.”
I thought this was a very interesting exchange, because it revealed a key crux between me and the people who expect transformative economic impacts in the next few years. Human workers are valuable precisely because we don’t need to build schleppy training loops for every small part of their job. It’s not net-productive to build a custom training pipeline to identify what macrophages look like given the way this particular lab prepares slides, then another for the next lab-specific micro-task, and so on. What you actually need is an AI that can learn from semantic feedback or from self directed experience, and then generalize, the way a human does.
Every day, you have to do a hundred things that require judgment, situational awareness, and skills & context learned on the job. These tasks differ not just across different people, but from one day to the next even for the same person. It is not possible to automate even a single job by just baking in some predefined set of skills, let alone all the jobs.
In fact, I think people are really underestimating how big a deal actual AGI will be because they’re just imagining more of this current regime. They’re not thinking about billions of human-like intelligences on a server which can copy and merge all their learnings. And to be clear, I expect this (aka actual AGI) in the next decade or two. That’s fucking crazy!
Economic diffusion lag is cope for missing capabilities
Sometimes people will say that the reason that AIs aren’t more widely deployed across firms and already providing lots of value (outside of coding) is that technology takes a long time to diffuse. I think this is cope. People are using this cope to gloss over the fact that these models just lack the capabilities necessary for broad economic value.
Steven Byrnes has an excellent post on this and many other points:
New technologies take a long time to integrate into the economy? Well ask yourself: how do highly-skilled, experienced, and entrepreneurial immigrant humans manage to integrate into the economy immediately? Once you’ve answered that question, note that AGI will be able to do those things too.
If these models were actually like humans on a server, they’d diffuse incredibly quickly. In fact, they’d be so much easier to integrate and onboard than a normal human employee (they could read your entire Slack and Drive in minutes and immediately distill all the skills your other AI employees have). Plus, hiring is very much like a lemons market, where it’s hard to tell who the good people are, and hiring someone bad is quite costly. This is a dynamic you wouldn’t have to worry about when you just wanna spin up another instance of a vetted AGI model.
For these reasons, I expect it’s going to be much much easier to diffuse AI labor into firms than it is to hire a person. And companies hire lots of people all the time. If the capabilities were actually at AGI level, people would be willing to spend trillions of dollars a year buying tokens (knowledge workers cumulatively earn 10s of trillions of dollars of wages a year). The reason that lab revenue are 4 orders of magnitude off right now is that the models are nowhere near as capable as human knowledge workers.
Goal post shifting is justified
AI bulls will often criticize AI bears for repeatedly moving the goal posts. This is often fair. AI has made a ton of progress in the last decade, and it’s easy to forget that.
But some amount of goal post shifting is justified. If you showed me Gemini 3 in 2020, I would have been certain that it could automate half of knowledge work. We keep solving what we thought were the sufficient bottlenecks to AGI (general understanding, few shot learning, reasoning), and yet we still don’t have AGI (defined as, say, being able to completely automate 95% of knowledge work jobs). What is the rational response?
It’s totally reasonable to look at this and say, “Oh actually there’s more to intelligence and labor than I previously realized. And while we’re really close to (and in many ways have surpassed) what I would have defined as AGI in the past, the fact that model companies are not making trillions is revenue clearly reveals that my previous definition of AGI was too narrow.”
I expect this to keep happening into the future. I expect that by 2030 that the labs will have made significant progress on my hobby horse of continual learning, and the models will start earning 100s of billions in revenue, but they won’t have automated all knowledge work, and I’ll be like, “We’ve made a lot of progress, but we’re not at AGI yet. We also need X, Y, and Z thing to get to trillions in revenue.”
Models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate the long timelines people predict.
RL scaling is laundering the prestige of pretraining scaling
With pretraining, we had this extremely clean and general trend in improvement in loss across multiple orders of magnitude of compute (albeit on a power law, which is as weak as exponential growth is strong). People are trying to launder the presitge of pretraining scaling, which was almost as predictable as a physical law of the universe, to justify bullish projections about RLVR, for which we have no well fit publicly known trend. When intrepid researchers do try to piece together the implications from scarce public datapoints, they get quite bearish results. For example, Toby Ord has a great post where he cleverly connects the dots between different o-series benchmark charts, which suggested “we need something like a 1,000,000x scale-up of total RL compute to give a boost similar to a GPT level”.
Broadly deployed intelligence explosion
People have spent a lot of time talking about a software only singularity (where AI models write the code for a smarter successor system), a software + hardware singularity (where AIs also improve their successor’s computing hardware), or variations therein.
All these scenarios neglect what I think will be the main driver of further improvements atop AGI: continual learning. Again, think about how humans become more capable at anything. It’s mostly from experience in the relevant domain.
Over conversation, Beren Millidge made the interesting suggestion that the future might look continual learning agents going out, doing jobs and generating value, and then bringing all their learnings back to the hive mind model, which does some kind of batch distillations on all these agents. The agents themselves could be quite specialized - containing what Karpathy called “the cognitive core” plus knowledge and skills relevant to the job they’re being deployed to do.
“Solving” continual learning won’t be a singular one-and-done achievement. Instead, it will feel like solving in context learning. GPT-3 demonstrated that in context learning could be very powerful (its ICL capabilities were so remarkable that the title of the GPT-3 paper is ‘Language Models are Few-Shot Learners’). But of course, we didn’t “solve” in-context learning when GPT-3 came out - and indeed there’s plenty of progress still to be made, from comprehension to context length. I expect a similar progression with continual learning. Labs will probably release something next year which they call continual learning, and which will in fact count as progress towards continual learning. But human level continual learning may take another 5 to 10 years of further progress.
This is why I don’t expect some kind of runaway gains to the first model that cracks continual learning, thus getting more and more widely deployed and capable. If you had fully solved continual learning drop out of nowhere, then sure, it’s “game set match”, as Satya put it. But that’s not what’s going to happen. Instead, some lab is going to figure out how to get some initial traction on the problem. Playing around with this feature will make it clear how it was implemented, and the other labs will soon replicate this breakthrough and improve it slightly.
Besides, I just have some prior that competition will stay fierce, informed by the observation that all these previous supposed flywheels (user engagement on chat, synthetic data, etc) have done very little to diminish the greater and greater competition between model companies. Every month (or less), the big three will rotate around the podium, with other competitors not that far behind. There is some force (potentially talent poaching, rumor mills, or reverse engineering) which has so far neutralized any runaway advantages a single lab might have had.









