Discussion about this post

User's avatar
Will Michaels's avatar

Related to a point made in your podcast with Ilya: it seems like one of the things that allows humans to learn quickly is that the space of misunderstandings humans have is heavily constrained and largely predictable. For example, when learning calculus most pitfalls/confusions are very common and can thus be called out when teaching someone. The mistakes that AIs make are unpredictable (same AI makes different mistakes at different points) but also unintuitive (we don't have a good model for when an AI will be reliable and when it won't). This makes creating a learning environment where all the possible mistakes are not only identified but also penalized correctly incredibly difficult.

This of course relates to your broader point about continual learning. If we could create a model architecture that constrains the AI to fail in predictable ways this seems like it would be a large step towards continual learning.

Expand full comment
Daniel Kokotajlo's avatar

Great post!

Some musings:

(1) In AI 2027, continual learning gradually gets solved. Till early 2027 it's just incremental improvements on the current paradigm -- e.g. figuring out ways to update the models more regularly like every month, every week, etc. rather than every few months. Then partway through 2027, thanks to the acceleration effects of R&D automation, they get to something more principled and paradigm-shifting and human-like. I still expect something like this to happen though I think it'll take longer. You say above "how could these dumb, non-continual-learning LLM agents figure out how to do continual learning?" I think the answer is simple: They just have to accelerate the usual process of AI R&D significantly. If you feel like continual learning is 10-20 years away at the current pace of algorithmic progress, well, if you also feel like Claude Opus 7.7 will be able to basically automate all coding labor, and also be pretty good at analyzing experiment results and suggesting ablations etc., then it's reasonable to conclude that a few years from now, the 5-15 remaining years will be compressed into 1-3 remaining years. For example.

(2) Current paradigm does seem to need more RLVR training data to get good at something compared to humans. Indeed. However, (a) maybe in-context learning can basically be a form of continual learning, once it gets good enough? Like, maybe with enough diverse RL environments, you achieve what pretraining achieved for common sense world understanding, but for agency. You achieve general-purpose agents that can be dropped into a new situation and figure it out as they go, taking notes to self in their scratchpad/CoT memory bank filesystem.

Also (b) think of the collective, the corporation-within-a-corporation, rather than the individual LLM agent. In the future this collective could be autonomously managing a giant pipeline of data collection, problem identification, RLVR environment generation, etc. that functions as a sort of continual learning mechanism for the collective. E.g. the collective could autonomously decide that it's important to learn how to do XYZ for some reason (perhaps due to analyzing trajectories and talking to customers and learning about the ways in which limited XYZ skills are hampering them) and then they could spin up the equivalent of thousands of engineers' worth of labor to build the relevant environments, train on them, update the models, etc. The collective would still need e.g. 1000x more data than a human to get good at something, but because it has tens of thousands of copies out collecting data & because it intelligently manages the data collection process, it overall learns new skills and jobs *faster* than humans. (At least, those skills and jobs that can be solved this way. The skill of winning a war, for example, it wouldn't be able to learn this way, because it can't deploy 1000 copies into 1000 different wars.)

Expand full comment
45 more comments...

No posts

Ready for more?