Richard Sutton – Father of RL thinks LLMs are…

Sep 26, 2025

Watch now (66 mins) | LLMs aren’t Bitter-Lesson-pilled

34 Comments

Sep 27, 2025

Sutton is absolutely brilliant but he doesn't explain himself well in this conversation. How is an RL-based solution like AlphaZero fundamentally different from using an LLM with chain of thought and tools? It's true that currently the LLMs we use don't update their weights as they interact with the world but there's no fundamental reason that won't happen in the future. This is like saying AlphaGo is not interesting because it starts with human knowledge. Eventually there will be an AlphaZero moment for LLMs where they start with trivial capabilities and then train themselves on things they encounter. Regarding goals - whenever i learn something new i can say that a goal of mine is to be able to explain the topic and answer all sorts of questions about it. It's a self-defined goal but still a valid one and one that can be tested.

Reply (2)

Danny Jeck

Sep 27, 2025Edited

I agree. If I tried to explain what I thought he meant, it would come back to the RL-first approach of sense-act-reward. I was very confused though that the discussion completely skipped over the fact that LLMs ARE trained via RL. And the reward signal is “say what the person wants” during RLHF. Is the pretraining the problem? I don’t know.

Feer

Nov 13Edited

“Sutton is absolutely brilliant but he doesn't explain himself well in this conversation.”.

This is probaly in part due to the interviewer being a clueless grifter that does not understand the subject and is incapable of asking proper questions. The whole interview was just embarassing to watch.

Rob Lucas

Sep 27, 2025Edited

"If you look at how psychologists think about learning, there’s nothing like imitation."

Bandura? I guess Sutton is talking about a different level of explanation, but what a ridiculous sentence to utter.

Reply (1)

Wes

Sep 27, 2025

I was flabbergasted by that sentence, too. In practice, imitation is the primary mode of learning for children, especially in the cognitive domain.

Sushanth Reddy

Sep 27, 2025

I think Richard Sutton’s bias toward elegant solutions with simple objective/goal is understandable given his mathematical background (theorem proving, RL foundations).

He finds the current paradigm — "pretrain + post-train tasks + train in agentic environments" - inelegant, perhaps, and too messy for achieving human-level intelligence.

It doesn’t really matter who is right because we have really good researchers working along both these perspectives.

Conn. Yankee

Sep 29, 2025

This was an unusually good interview, in large part because interviewer and interviewee often disagreed.

Interesting that one focus of the comments is on Sutton’s claim that infants learn by experimentation rather than imitation. I have a twelve-week-old at home (my fourth), and as an empirical matter, I would say the really young ones do what Sutton says: the flail around until something works. Without question, they later start to imitate, but the beginnings look pretty haphazard. (Perhaps it is not haphazard at all. Perhaps infants follow an evolutionarily hardwired course of exercising and gaining control over bodily systems. It is true that all of mine have first gotten control over the head and neck, then have started on rolling, then have progressed to crawling and walking …. Not much variation. Is that because the progression is wired in or because only a few things work, and work in a certain order, so random flailing is bound to hit upon those same few things eventually?)

Reply (1)

lostcat

Oct 1

>Without question, they later start to imitate

I feel babies would walk right out of the womb if their muscles are developed enough at birth, without any requiring to see any human walk, because they are curious and want to explore.

Jeffrey Li

Sep 27, 2025

Although I don't agree with most of what Sutton has said, I'd clarify for him that when he said "imitation is not the primary form of learning", he meant the original source of a piece of knowledge or skill. Take the seal hunting example given by Dwarkesh. There must be someone who obtained the skill first before others can learn from them. Imitation of course makes learning much much faster. But it doesn't produce new knowledge.

Leo Li

Feb 22

Great interview. Sutton's core point on the importance of foundational learning may sound contrarian, but the more you sit with it, the harder it is to disagree.

I recently wrote a piece approaching the same question from developmental science: How Humans Learn: Rethinking How AI Systems Are Trained, and the two perspectives align more than you'd expect.

https://themeditations.substack.com/p/how-humans-learn

Would love to hear your thoughts.

nagesh

Oct 20

clone, instruct, explore and research.. very different things..

cloning based learning --> someone already learned something you just clone that learning.. why are babies cloning? what motivates them to clone? I am not sure of the rewards here.. may be a good job from parent is a reward..

instruction based learning --> someone already learned something, they pass that on to you and you may learn.. a reward mechanism exists here, you may not learn if you don't perceive reward.. lack of sufficient rewards or understanding of rewards --> decreased learning despite instruction.. fear also is a form of reward (i.e lack of punishment)..

explore & learn --> do something new, learning from feedback.. (eg., the classic child touches something hot + feedback not good + learns to not touch again).. the reward here is lack of pain

research & learn --> I am not sure if this is same as explore or fundamentally different.. I feel this is same as explore, just the reward perception is different..

probably Richard treating only the third / fourth as true learning of world.. and Dwarkesh is including first and second also..

- a random guy

Martin Smit

Sep 29, 2025

Excellent podcast. I see it a bit like riding a bike. You can read all the books you want and watch 1000s of people riding their bikes, but that will still not teach you how to actually ride a bike. That will only come by actually doing it, including the falling over.

Sharmake Farah

Sep 27, 2025

IMO, one of the biggest capability secrets around the human brain in particular is not just about continual learning and long-term memory, but how evolution managed to train RNN architectures with internal state (aka neuralese recurrence + memory), like our own brains, without the usual problems that come with RNN training that made the AI field switch to transformers, like vanishing/exploding gradients.

If the AI field could solve the problem of how to actually train RNNs, there's a real chance that RNNs get revived again for AI capabilities.

Nick Savage

Sep 26, 2025

Great interview, as always. Two comments:

1. I am not so sure I buy the argument that LLMs have no goals. Maybe we just don’t understand what they are? Pursuing pleasure through sex might feel meaningless to ASI in the same way that next token prediction feels to us.

2. Empirically, its clear to me that supervised learning happens. If I sneak a cookie before supper, I guarantee that my son will learn through imitation that that is acceptable.

Reply (1)

smuu

Sep 27, 2025

Re 1.: if an llm is never prompted, will it do anything?

Florin Andrei

Dec 22

This was painful to listen to.

Dwarkesh, you failed to comprehend a fairly basic point this whole time, and then you argued with a world-class expert, as if you were entitled to your own opinions.

You're a dude with a microphone, you don't get to have "opinions" in a field where you've done nothing.

"Preparing" for the episode with an LLM will not fill in the lack of knowledge when it comes to the fundamentals. And it will definitely not fix a wildly inflated self-image. There's a great discrepancy between what you feel you understand, and what you actually do understand.

Sunil Malhotra

Dec 13

Why do you both say there are no universal values? Have you come across the concept of Dharma and other values espoused in the Vedantic discourse?

Nick McGivney

Nov 29

I enjoyed this particular interview enormously, precisely because of the awkwardness and friction. I'm out of my depth with the subject matter, fascinated by it at the same time, and did not come with any fixed opinion. For me, although perhaps not for those arguing the nuances of LLMs or RL, the most telling thing that RS said was 'I’m in some sense a contrarian or someone thinking differently than the field is. I’m personally just content being out of sync with my field for a long period of time, perhaps decades, because occasionally I have been proved right in the past.' No matter how excited we get about the next thing, how it fits with all that's gone before will still be important. Thank you Dwarkesh, I thought you teased it out heroically. :)

Jimimased

Nov 23

Best conversation in history

lostcat

Oct 1

he's right

Joe Marta

Sep 29, 2025Edited

Some observations from a product guy who is by no means a researcher.

I'm a bit perplexed the topic of RLHF in modern chat systems and models didn't come up here. That was one of OpenAI's big innovations in the early days of this Gen AI wave. It seems directly relevant to the topic.

A great interview altogether though. Good ML/AI interviews make you ask really deep philosophical questions. For example one gap in the "RL everything" approach is that you must necessarily be embodied and have sensations in order to replicate human intelligence. For example, you can't RL creative writing or any aesthetic concern directly as a reward function. It has to be proxied through some measurement of human preference between the machine can't be given an aesthetic sense without having eyes, emotions, a lived experience to relate it all to -- all the things that make art, art. And to provoke this chain of thought, in my eyes, means the interview was thoughtfully done. Kudos!

Reply (1)

Joe Marta

Sep 29, 2025

Double post :)

While its good for clicks I'm sure, there is this unvoiced assumption that superhuman intelligence is the end-all-be-all, and the goal for all AI research. I don't think it is or should be. The vacuum cleaner didn't achieve floors which are perfectly spotless at all times but it's still a lot more effective than a broom.

LLMs can just be natural language instruct-able black boxes that take in text and output other text that's mostly right, most of the time, and still be incredibly useful and valuable products. In fact just this alone will provide people with entire career's worth of things to build and do. "AGI" has no accepted definition so jumping to it every time inevitably turns the entire conversation into sci-fi.

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are…