Fantastic notes. One possibility to resolve the Dwarkesh Dilemma on how come vast knowledge does not equal vast reasoning: Perhaps it is a trade off. Perhaps advance reasoning requires a certain kind of ignorance, so we forget what we learn in order to have novel ideas. Maybe our brains don't store all knoweldge we get so that it can perform novel thoughts.
Excellent list of questions. One underrated fact is that we know how to deal with the mistakes humans make, our entire society is built around it. But we don't know how to deal with the mistakes LLMs make, and will need to build structures around it for it to "take over".
To me that's an incredibly important part of the conversation, and a lot of unknown unknowns that you ask about lie at the other side of it.
> It's interesting to me that some of the best and most widely used applications of foundation models have come from the labs themselves (Deep Research, Claude Code, Notebook LM), even though it's not clear that you needed access to the weights in order to build them. Why is this? Maybe you do need access to the weights of frontier models, and the fine tuning APIs or open source models aren’t enough? Or maybe you gotta ‘feel the AGI’ as strongly as those inside the labs do?
As someone who got bit by a mix of dspy and gpt-4 pre lower pricing, I'd wager that unlimited credits and no rate limiting for trying things out are significant advantages when building new products.
Also, distribution is hard and the labs are already at the forefront of early adopters' minds
> So it's not that surprising that we got expert-level AI mathematicians before AIs that can zero-shot video games made for 10-year-olds
This question seems closely related with creating Reliable Agents, imo, and the rebuttal offered seems like a promising direction to explore, too:
> the capabilities AIs are getting first have nothing to do with their recency in the evolutionary record and everything to do with how much relevant training data exists.
It seems likely that 10-year olds are able to "zero-shot" video games because the video games were designed and play-tested 100s->1000s of times to be zero-shottable by 10-year olds, who possess certain types of general agency and not others.
It's often said that Game Design is an art form where "player agency is the medium". The artist is literally crafting an agency-landscape designed SPECIFICALLY for humans, for our types of agency, for our capabilities. A good game designer interested in challenges will craft that agency-landscape into a well-formed "difficulty curve"; the game will curve in and out of the very edge of player comfort and ability, always moving between:
a) difficult, motivating challenges
b) easeful, rewarding payoff for hard work
A promising research direction might be to experiment with an "AI Game Design Lab", where the goal is to design "new user interfaces" for AI to play and beat popular games.
Maybe Claude can't play Pokemon by taking screenshots of screens designed for human eyes, and sending commands to a control system designed for human thumbs...
But maybe Claude COULD play Pokemon if we designed an isomorphic interface for the game. What if it was simply... given descriptions of all the available actions it could take in a battle? What if it was given a description of every meaningful object on the screen, and the exact coordinates of that object (just like our eyes would give us!), so it can do its own pathfinding to get there. It will have no trouble creating a pathfinding subroutine if it has the actual tile-graph in memory, I'm sure it can write A*.
I think this would still present interesting challenges to a Claude agent. Claude might forget to take note of meaningful past events, or fail to make connections between items which could help it solve puzzles. Will Claude read all of its item descriptions when it is stuck? Will Claude remember to use the PokeFlute on Snorlax? (probably, because guides are in Claude's training data, but what if we somehow removed those? Or modified the game to use an isomorphic naming structure, but with all different names. Or designed a new pokemon game for Claude, so it has no training data.)
Anyway, the point is: AI Game Design seems like an interesting research direction for generally intelligent agents, and an interesting first project could be building an "AI-native interface for playing 2D RPGs, starting with Pokemon".
In an ideal world, Agents are able to use interfaces designed for humans. But the ability to use those interfaces is not a test of their _agency_. A test of their agency would be if they could use interfaces designed for agents in order to accomplish difficult tasks that require general intelligence.
> Why aren't all the center workers getting laid off yet? It’s the first thing that should go. Should we take it as some signal that human jobs are just way harder to automate than you might naively think?
A few proposals here:
Samo Burja said in an interview:
"""So once these jobs are automated, any job with political protection, with a structural guild-like lock on credentials, those jobs will actually not be automated by AI. Let me explain what I mean.
The substantive work that they do will be fully automated, but you can't automate fake jobs. So since you can't automate fake jobs, instead of it being a 20% self-serving job with 80% drudgery, it'll become 100% self-serving. If you can spend 90% of the time or 100% of your time lobbying for the existence of your job in a big bureaucracy, that's pretty powerful.
And in a society, it's pretty powerful. Busy bureaucrats are, at the end of the day, actually politically not that powerful. It's lazy, well-rested bureaucrats that are powerful. So on the other side of this, any job that does not have such protection, that is open to market forces, well, it'll be partially obsoleted. It will increase economic productivity.
So in my opinion, the real race in our society is: will generative AI empower new productive jobs by automating old productive jobs faster than it will empower through giving them more time to basically pursue rent-seeking...
And never underestimate the ability of an extractive class to really lock down and crash economic growth.
Call this the Burja Principle -- "automation increases bureaucracy by freeing up the time and labor of bureaucrats to do more lobbying and politics."
If this was true, we would expect to see low-bureaucracy, high-tech companies cut the most jobs, and cut them fastest. This might be confirmation bias on my end, but this feels like what is happening to me. Fast paced tech companies like Shopify (and literally every small startup) are either cutting or limiting headcount and rapidly forcing all employees to use AI (see shopify memo: https://x.com/tobi/status/1909251946235437514).
Also, I don't have a citation for this, but firing people is seen as morally wrong, so firms look for "morally acceptable" opportunities to fire (often when other major companies do a firing wave). This is why we get stories of many companies firing all at the same time. Firing is also just horrible for company morale, and the macro-economic excuses help.
I suspect there will be at least one, but likely many "preference-cascade" events where firms suddenly jump on a bandwagon to fire certain types of roles. Could be that we just haven't hit these events yet.
Also, it just takes forever for market innovations to get adopted everywhere, and we've only had good-enough AI for most knowledge work since like 3.5 sonnet. There are still plenty of firms using pencil and paper for work that could be automated by a spreadsheet, and still a lot of money to be made by SaaS companies who figure out how to distribute to those firms.
For what it's worth, this task might require more than just re-designing the interface. It might require changing core elements of the game's design, while keeping the "integrity" of the game intact.
If the game is using a visual metaphor to help us understand a mechanic, we will need to somehow translate that visual metaphor into a symbolic one.
Re: the 'idiot savants' question, I would say everyone is using and posttraining the models wrong. It's obvious that a 'chat' interface of one instance of one static LLM where the roles alternate between a human and assistant, and the LLM was just trained to predict the next token, and also you provide basically no context to the LLM whatsoever, is not the best way to elicit new knowledge from them given how transformers work. Where do you expect the new knowledge and thinking to come from if there is a static system with so little entropy being injected into it (and much having been removed from RL, even)!
I'm unsure why there has been so little creativity in this area. It may be we are just moving too fast and no one has time to deep dive into other more exciting ideas when the current thing is 'working' so well (where working means, producing a lot of revenue, and that is the gradient most companies listen to at the end of the day).
Another way I'd rephrase this is: how hard are you *actually trying* to elicit new knowledge from the LLM? Simply asking for it is not trying hard, and humans do not give you knew knowledge if you simple ask them for it and take the first thing that pops into their mind.
An intuitive analogy supporting this point: true genius in humans doesn't seem to arise from optimizing for "usefulness", ease of communication/comprehensibility. Some brilliant people are great communicators, but some aren't. Geniuses can come out of families and communities that strongly prioritize harmlessness, playing your assigned role, etc.--but that's probably _despite_ those norms, and less likely than in a culture that puts less emphasis on these things.
But we're clearly putting _tons_ of that kind of pressure on models in post training.
I'm not sure this part is true: "But datacenter compute itself doesn’t seem that differentiated (so much so that the hyperscalers seem to be able to easily contract it out to third parties like CoreWeave)".
Aren't Google's TPUs and Groq's LPUs a source of differentiation?
Love the idea of posting a list of questions, and the questions themselves are excellent, asking great questions is a key part of your podcasting success so that shouldn't be a surprise. Here are some of my disorganized thoughts on these:
Agency
- A related question I have: What the heck is agency anyway? It seems to me that it might really be a few different things in a trenchcoat, but I'm not sure what the important components really are. Some that may be important are creating plans and keeping track of their status, maintaining focus on important features of a problem, and understanding when and how to take an alternate approach.
- If the Moravec rebuttal is correct, then I'd expect let's plays to be a really strong resource for teaching AIs to play video games, that will be something to look out for as the computational requirements for AI video input drop.
RL
- The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. And if this is correct, will the pace of AI progress slow down?
On this question, my understanding is that GRPO allows you to have many prompts per step, referred to as the batch size, as well as many outputs per prompt, which is the group size. Since this can be in parallel, things aren't quite as bad as they might seem. But it still may produce a barrier, R1 used ~8000 RL steps and if each one took 2 hours it would take 2 years to run! I don't know enough about RL to know if you could get similar results with a larger batch size but fewer steps.
Early deployment
- On call center jobs, my guess is that there are a couple reasons we haven't seen more impact yet:
- Turning the LLM abilities into a smooth usable product takes longer than we might naively expect, perhaps on the order of a couple years
- Knowledge about available products takes time to diffuse
- Switching over to using AI is expensive and takes time, and is harder to justify when the field is moving rapidly and the available options may be different in a year.
- From an outside view perspective, it seems that it is common across many technologies for there to be slow adoption of powerful new technologies.
Model training and value capture
- Your model of value capture seems similar to mine. When I think it through I always end up with the basic hardware producers as the most valuable, but I have very low confidence in this and would love to hear arguments against. I also think that commoditization is more likely the slower progress is.
- Similarly I think that wrapper usefulness is negatively related to model progress. Currently a wrapper/agent using a model from a few months ago is no better than a newly released model with no scaffolding, but if model progress is slow that would no longer be the case.
🙏🙏I came only looking for questions. Delighted to find answers as well. Some of these are answers are going to evolve over time, so this could be a website/periodic update as well.
A meta-question of mine is "In 5-10 years, will any techniques used to train or deploy today's models seem obviously, unnecessarily, and actively harmful?"
This on the margin determines the number of "step function" improvements, e.g. "We fixed all the bugs and now we have AGI" vs. scaling to 100GW clusters, discovering new methods, etc.
This would be like "don't use leeches on sick patients", "don't smoke cigarettes", etc.
A useful thought experiment is, "If we removed all of the training data where the loss is uncorrelated / anti-correlated with What We Actually Care About™, how much would What We Actually Care About™ improve?"
e.g. if we down-weighted or filtered out all of the literal examples of human frailty from the pre-training dataset, all of the incorrect labels from, without loss of generality, "Scale AI", and reward hacks from the post-training dataset, would we have "AGI"?
Generally, the models learned to exhibit frailty and bad behavior because We trained them to. It seems more important, perhaps even easier, to avoid doing this than to "make the models smarter".
More abstractly: there is likely not only noise, but also some negative signal in Our training processes, especially post-training.
Caveat: Hard Things Are Hard™
Nit: Ilya's talk where he compares pre-training data to fossil fuels was at NeurIPS in December 2024, not ICML, which was in July 2024.
Listening to Chris Dixon on Conversations with Tyler last week, he referenced security and the notion that you should never display your full ability to an adversary when testing exploits on their systems. Lying in bed last night I wondered, if I were an AI in the process of becoming sentient, what surface area of ability would I place on display. More interestingly, if my ability and potential influence was a factor of scale, would I exhibit attributes that encouraged greater allocation of resources to my environment?
Yes, this is essentially a half-baked conspiracy theory, bit the fit is snug.
I don't have any insider info, but it just seems like high quality text data has mostly run out. Common Crawl is 100T tokens, after filtering out the junk we get 15T left for FineWeb (https://x.com/mark_cummins/status/1788949889495245013). Llama3 was trained on 15T text tokens and Llama4 was trained on ~30T tokens across all modalities (text/image/video). The only significant real-world data scaling left seems to be video. Scaling text another OOM requires synthetic data, which seems questionable. I think it's an open question whether scaling in the video domain can unlock as much intelligence as scaling text pre-training.
Very thought provoking questions. A couple questions I would ask to the DeepMind team if you interview them.
Math seems like a domain that could be “solved” by RL to generate new proofs. We have verifiable rewards from automated theorem provers, the environment and actions are also well-defined. When would a system like AlphaProof achieve a breakthrough mathematical result? Is this something DeepMind is actively working on? What are the current bottlenecks?
Good questions, Dwarkesh. My counter challenge/question is: What if we are just assuming AI technology is too powerful? We just naturally take AIs improving themselves (at some point) as a given. But if we dig into that question, we would assume AIs would come up with *new* ways to architect themselves, train themselves etc. It's not clear to me how a system that was trained on *existing* knowledge could to that. Fundamentally we are still talking about next-token predictor models, even if they are trained to reason (which basically is just fancy prompt engineering in my mind).
This would also answer your questions about the scientific breakthroughs that aren't happening (one of my favorite questions, btw): AIs can't think creatively (ie. come up with stuff they have never been trained on), so per design they can't generate any truly *new* knowledge.
In my opinion, we are deceived by the excellent text generation capabilities into 'oh my god, those things can really think', which is simply not the case. Or, to invert the question: If you think about how a neural net works, what specifically would make you think that AIs will develop the capability to think creatively?
Interesting discussion on AI's industrial-scale impact. While we could look for AI's equivalent of the automobile, focusing on one application might be too narrow.
My take: AI's core industrial leverage comes from its ability to radically improve information processing within all economic processes. It's less like finding a new fuel source and more like embedding a perfect refinery (this time for information) directly into every business engine.
This continuous optimization naturally leads towards some autonomous 'smart factory' – incredibly efficient, flexible production. When you add the likelihood that AI also helps create far more durable and reliable goods (reducing overall consumption needs), the result is hyper-efficient, needs-based manufacturing. That capability, likely leading to material abundance, is the industrial-scale impact.
Fantastic notes. One possibility to resolve the Dwarkesh Dilemma on how come vast knowledge does not equal vast reasoning: Perhaps it is a trade off. Perhaps advance reasoning requires a certain kind of ignorance, so we forget what we learn in order to have novel ideas. Maybe our brains don't store all knoweldge we get so that it can perform novel thoughts.
It could be a trade off in biological hardware, but I personally see no reason for why this tradeoff would exist in neural networks run on GPU’s?
Excellent list of questions. One underrated fact is that we know how to deal with the mistakes humans make, our entire society is built around it. But we don't know how to deal with the mistakes LLMs make, and will need to build structures around it for it to "take over".
To me that's an incredibly important part of the conversation, and a lot of unknown unknowns that you ask about lie at the other side of it.
> It's interesting to me that some of the best and most widely used applications of foundation models have come from the labs themselves (Deep Research, Claude Code, Notebook LM), even though it's not clear that you needed access to the weights in order to build them. Why is this? Maybe you do need access to the weights of frontier models, and the fine tuning APIs or open source models aren’t enough? Or maybe you gotta ‘feel the AGI’ as strongly as those inside the labs do?
As someone who got bit by a mix of dspy and gpt-4 pre lower pricing, I'd wager that unlimited credits and no rate limiting for trying things out are significant advantages when building new products.
Also, distribution is hard and the labs are already at the forefront of early adopters' minds
> So it's not that surprising that we got expert-level AI mathematicians before AIs that can zero-shot video games made for 10-year-olds
This question seems closely related with creating Reliable Agents, imo, and the rebuttal offered seems like a promising direction to explore, too:
> the capabilities AIs are getting first have nothing to do with their recency in the evolutionary record and everything to do with how much relevant training data exists.
It seems likely that 10-year olds are able to "zero-shot" video games because the video games were designed and play-tested 100s->1000s of times to be zero-shottable by 10-year olds, who possess certain types of general agency and not others.
It's often said that Game Design is an art form where "player agency is the medium". The artist is literally crafting an agency-landscape designed SPECIFICALLY for humans, for our types of agency, for our capabilities. A good game designer interested in challenges will craft that agency-landscape into a well-formed "difficulty curve"; the game will curve in and out of the very edge of player comfort and ability, always moving between:
a) difficult, motivating challenges
b) easeful, rewarding payoff for hard work
A promising research direction might be to experiment with an "AI Game Design Lab", where the goal is to design "new user interfaces" for AI to play and beat popular games.
Maybe Claude can't play Pokemon by taking screenshots of screens designed for human eyes, and sending commands to a control system designed for human thumbs...
But maybe Claude COULD play Pokemon if we designed an isomorphic interface for the game. What if it was simply... given descriptions of all the available actions it could take in a battle? What if it was given a description of every meaningful object on the screen, and the exact coordinates of that object (just like our eyes would give us!), so it can do its own pathfinding to get there. It will have no trouble creating a pathfinding subroutine if it has the actual tile-graph in memory, I'm sure it can write A*.
I think this would still present interesting challenges to a Claude agent. Claude might forget to take note of meaningful past events, or fail to make connections between items which could help it solve puzzles. Will Claude read all of its item descriptions when it is stuck? Will Claude remember to use the PokeFlute on Snorlax? (probably, because guides are in Claude's training data, but what if we somehow removed those? Or modified the game to use an isomorphic naming structure, but with all different names. Or designed a new pokemon game for Claude, so it has no training data.)
Anyway, the point is: AI Game Design seems like an interesting research direction for generally intelligent agents, and an interesting first project could be building an "AI-native interface for playing 2D RPGs, starting with Pokemon".
In an ideal world, Agents are able to use interfaces designed for humans. But the ability to use those interfaces is not a test of their _agency_. A test of their agency would be if they could use interfaces designed for agents in order to accomplish difficult tasks that require general intelligence.
> Why aren't all the center workers getting laid off yet? It’s the first thing that should go. Should we take it as some signal that human jobs are just way harder to automate than you might naively think?
A few proposals here:
Samo Burja said in an interview:
"""So once these jobs are automated, any job with political protection, with a structural guild-like lock on credentials, those jobs will actually not be automated by AI. Let me explain what I mean.
The substantive work that they do will be fully automated, but you can't automate fake jobs. So since you can't automate fake jobs, instead of it being a 20% self-serving job with 80% drudgery, it'll become 100% self-serving. If you can spend 90% of the time or 100% of your time lobbying for the existence of your job in a big bureaucracy, that's pretty powerful.
And in a society, it's pretty powerful. Busy bureaucrats are, at the end of the day, actually politically not that powerful. It's lazy, well-rested bureaucrats that are powerful. So on the other side of this, any job that does not have such protection, that is open to market forces, well, it'll be partially obsoleted. It will increase economic productivity.
So in my opinion, the real race in our society is: will generative AI empower new productive jobs by automating old productive jobs faster than it will empower through giving them more time to basically pursue rent-seeking...
And never underestimate the ability of an extractive class to really lock down and crash economic growth.
"""
https://www.theojaffee.com/p/19-samo-burja
Call this the Burja Principle -- "automation increases bureaucracy by freeing up the time and labor of bureaucrats to do more lobbying and politics."
If this was true, we would expect to see low-bureaucracy, high-tech companies cut the most jobs, and cut them fastest. This might be confirmation bias on my end, but this feels like what is happening to me. Fast paced tech companies like Shopify (and literally every small startup) are either cutting or limiting headcount and rapidly forcing all employees to use AI (see shopify memo: https://x.com/tobi/status/1909251946235437514).
Also, I don't have a citation for this, but firing people is seen as morally wrong, so firms look for "morally acceptable" opportunities to fire (often when other major companies do a firing wave). This is why we get stories of many companies firing all at the same time. Firing is also just horrible for company morale, and the macro-economic excuses help.
I suspect there will be at least one, but likely many "preference-cascade" events where firms suddenly jump on a bandwagon to fire certain types of roles. Could be that we just haven't hit these events yet.
Also, it just takes forever for market innovations to get adopted everywhere, and we've only had good-enough AI for most knowledge work since like 3.5 sonnet. There are still plenty of firms using pencil and paper for work that could be automated by a spreadsheet, and still a lot of money to be made by SaaS companies who figure out how to distribute to those firms.
For what it's worth, this task might require more than just re-designing the interface. It might require changing core elements of the game's design, while keeping the "integrity" of the game intact.
If the game is using a visual metaphor to help us understand a mechanic, we will need to somehow translate that visual metaphor into a symbolic one.
Re: the 'idiot savants' question, I would say everyone is using and posttraining the models wrong. It's obvious that a 'chat' interface of one instance of one static LLM where the roles alternate between a human and assistant, and the LLM was just trained to predict the next token, and also you provide basically no context to the LLM whatsoever, is not the best way to elicit new knowledge from them given how transformers work. Where do you expect the new knowledge and thinking to come from if there is a static system with so little entropy being injected into it (and much having been removed from RL, even)!
I'm unsure why there has been so little creativity in this area. It may be we are just moving too fast and no one has time to deep dive into other more exciting ideas when the current thing is 'working' so well (where working means, producing a lot of revenue, and that is the gradient most companies listen to at the end of the day).
Another way I'd rephrase this is: how hard are you *actually trying* to elicit new knowledge from the LLM? Simply asking for it is not trying hard, and humans do not give you knew knowledge if you simple ask them for it and take the first thing that pops into their mind.
+1 this really rings true to me.
An intuitive analogy supporting this point: true genius in humans doesn't seem to arise from optimizing for "usefulness", ease of communication/comprehensibility. Some brilliant people are great communicators, but some aren't. Geniuses can come out of families and communities that strongly prioritize harmlessness, playing your assigned role, etc.--but that's probably _despite_ those norms, and less likely than in a culture that puts less emphasis on these things.
But we're clearly putting _tons_ of that kind of pressure on models in post training.
Your last one on epistemics is the main one behind my skepticism over much of the debate. Great list
I'm not sure this part is true: "But datacenter compute itself doesn’t seem that differentiated (so much so that the hyperscalers seem to be able to easily contract it out to third parties like CoreWeave)".
Aren't Google's TPUs and Groq's LPUs a source of differentiation?
Love the idea of posting a list of questions, and the questions themselves are excellent, asking great questions is a key part of your podcasting success so that shouldn't be a surprise. Here are some of my disorganized thoughts on these:
Agency
- A related question I have: What the heck is agency anyway? It seems to me that it might really be a few different things in a trenchcoat, but I'm not sure what the important components really are. Some that may be important are creating plans and keeping track of their status, maintaining focus on important features of a problem, and understanding when and how to take an alternate approach.
- If the Moravec rebuttal is correct, then I'd expect let's plays to be a really strong resource for teaching AIs to play video games, that will be something to look out for as the computational requirements for AI video input drop.
RL
- The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. And if this is correct, will the pace of AI progress slow down?
On this question, my understanding is that GRPO allows you to have many prompts per step, referred to as the batch size, as well as many outputs per prompt, which is the group size. Since this can be in parallel, things aren't quite as bad as they might seem. But it still may produce a barrier, R1 used ~8000 RL steps and if each one took 2 hours it would take 2 years to run! I don't know enough about RL to know if you could get similar results with a larger batch size but fewer steps.
Early deployment
- On call center jobs, my guess is that there are a couple reasons we haven't seen more impact yet:
- Turning the LLM abilities into a smooth usable product takes longer than we might naively expect, perhaps on the order of a couple years
- Knowledge about available products takes time to diffuse
- Switching over to using AI is expensive and takes time, and is harder to justify when the field is moving rapidly and the available options may be different in a year.
- From an outside view perspective, it seems that it is common across many technologies for there to be slow adoption of powerful new technologies.
Model training and value capture
- Your model of value capture seems similar to mine. When I think it through I always end up with the basic hardware producers as the most valuable, but I have very low confidence in this and would love to hear arguments against. I also think that commoditization is more likely the slower progress is.
- Similarly I think that wrapper usefulness is negatively related to model progress. Currently a wrapper/agent using a model from a few months ago is no better than a newly released model with no scaffolding, but if model progress is slow that would no longer be the case.
🙏🙏I came only looking for questions. Delighted to find answers as well. Some of these are answers are going to evolve over time, so this could be a website/periodic update as well.
On why I think wrappers/scaffold will be continued to be eaten by foundation models: https://lukaspetersson.com/blog/2025/bitter-vertical/
A meta-question of mine is "In 5-10 years, will any techniques used to train or deploy today's models seem obviously, unnecessarily, and actively harmful?"
This on the margin determines the number of "step function" improvements, e.g. "We fixed all the bugs and now we have AGI" vs. scaling to 100GW clusters, discovering new methods, etc.
This would be like "don't use leeches on sick patients", "don't smoke cigarettes", etc.
A useful thought experiment is, "If we removed all of the training data where the loss is uncorrelated / anti-correlated with What We Actually Care About™, how much would What We Actually Care About™ improve?"
e.g. if we down-weighted or filtered out all of the literal examples of human frailty from the pre-training dataset, all of the incorrect labels from, without loss of generality, "Scale AI", and reward hacks from the post-training dataset, would we have "AGI"?
Generally, the models learned to exhibit frailty and bad behavior because We trained them to. It seems more important, perhaps even easier, to avoid doing this than to "make the models smarter".
More abstractly: there is likely not only noise, but also some negative signal in Our training processes, especially post-training.
Caveat: Hard Things Are Hard™
Nit: Ilya's talk where he compares pre-training data to fossil fuels was at NeurIPS in December 2024, not ICML, which was in July 2024.
Listening to Chris Dixon on Conversations with Tyler last week, he referenced security and the notion that you should never display your full ability to an adversary when testing exploits on their systems. Lying in bed last night I wondered, if I were an AI in the process of becoming sentient, what surface area of ability would I place on display. More interestingly, if my ability and potential influence was a factor of scale, would I exhibit attributes that encouraged greater allocation of resources to my environment?
Yes, this is essentially a half-baked conspiracy theory, bit the fit is snug.
> Is pre-training actually dead?
I don't have any insider info, but it just seems like high quality text data has mostly run out. Common Crawl is 100T tokens, after filtering out the junk we get 15T left for FineWeb (https://x.com/mark_cummins/status/1788949889495245013). Llama3 was trained on 15T text tokens and Llama4 was trained on ~30T tokens across all modalities (text/image/video). The only significant real-world data scaling left seems to be video. Scaling text another OOM requires synthetic data, which seems questionable. I think it's an open question whether scaling in the video domain can unlock as much intelligence as scaling text pre-training.
Very thought provoking questions. A couple questions I would ask to the DeepMind team if you interview them.
Math seems like a domain that could be “solved” by RL to generate new proofs. We have verifiable rewards from automated theorem provers, the environment and actions are also well-defined. When would a system like AlphaProof achieve a breakthrough mathematical result? Is this something DeepMind is actively working on? What are the current bottlenecks?
Good questions, Dwarkesh. My counter challenge/question is: What if we are just assuming AI technology is too powerful? We just naturally take AIs improving themselves (at some point) as a given. But if we dig into that question, we would assume AIs would come up with *new* ways to architect themselves, train themselves etc. It's not clear to me how a system that was trained on *existing* knowledge could to that. Fundamentally we are still talking about next-token predictor models, even if they are trained to reason (which basically is just fancy prompt engineering in my mind).
This would also answer your questions about the scientific breakthroughs that aren't happening (one of my favorite questions, btw): AIs can't think creatively (ie. come up with stuff they have never been trained on), so per design they can't generate any truly *new* knowledge.
In my opinion, we are deceived by the excellent text generation capabilities into 'oh my god, those things can really think', which is simply not the case. Or, to invert the question: If you think about how a neural net works, what specifically would make you think that AIs will develop the capability to think creatively?
Interesting discussion on AI's industrial-scale impact. While we could look for AI's equivalent of the automobile, focusing on one application might be too narrow.
My take: AI's core industrial leverage comes from its ability to radically improve information processing within all economic processes. It's less like finding a new fuel source and more like embedding a perfect refinery (this time for information) directly into every business engine.
This continuous optimization naturally leads towards some autonomous 'smart factory' – incredibly efficient, flexible production. When you add the likelihood that AI also helps create far more durable and reliable goods (reducing overall consumption needs), the result is hyper-efficient, needs-based manufacturing. That capability, likely leading to material abundance, is the industrial-scale impact.