Reiner Pope – The math behind how LLMs are…

Apr 29

It's shocking how much you can deduce about what the labs are doing from a handful of equations and a blackboard

9 Comments

The batch size economics alone justify watching the full two hours. Most people debating AI costs are arguing about model size when the real variable is how many requests you pack into each forward pass. Cursor charging 6x for 2.5x the speed isn't a premium. It's the cost of reserving capacity in a smaller batch so your tokens don't wait in line behind everyone else's.

The Chinchilla-optimal section at 01:18 is the one that reframes the entire scaling debate. Models being 100x over-trained beyond Chinchilla because of RL means the labs have quietly abandoned the scaling law that everyone outside the labs is still citing. The economics shifted from "train bigger" to "train longer on the same size" and the public conversation hasn't caught up.

Dwarkesh building flashcards for a podcast interview is either extreme preparation or the most endearing study habit in tech media. Probably both.

Neil Tilling

Jun 2

the weight bench eh? I can hear the pattern here.

lower bound indeed..

Logan Thorneloe

May 21

Great work! This is an incredible video I recommend every engineer wanting to work in AI (or already working in it) to watch.

Thiago Pédico Saragiotto

May 19

If educators (and parents) are going to redesign assessment and rigor, somebody in the loop needs the courage to understand compute and optimization a little more deeply—not to become an ML engineer, but to recognize what’s brittle vs fundamental about “AI fluency.” That’s the vibe behind my piece on education as institutional adaptation, not gadget adoption: https://thiagopedicosaragiotto.substack.com/p/from-the-scroll-to-the-algorithm

Cody Rushing

May 11

> But if you’re decreasing this by 2x and then having this go up by 8x every time you double sparsity.

(saying this because I got tripped up by this for a bit) I believe Dwarkesh misspoke here when he said the phrase 'every time you double sparsity', because here we are 8xing sparsity

Dorian

May 4

This is a solid synthesis.

But synthesis isn’t leverage.

Understanding something doesn’t change outcomes

unless it translates into:

– positioning

– decision rules

– or constraints you can act on

Most content stops at insight.

The edge starts when insight becomes something you can execute against.

idiotretardfool

Apr 30

blackboarding skill is probably not very common and not strongly predicted by the general intelligence of your interviewee tbh

around ~10 mins in, reiner wants to draw a pink summed line of the kv and weight fetch, and is forced to draw it lower to ensure it intersects with the blue compute line. but this is bad because it made the pink line non-parallel with the yellow kv fetch line, which is incorrect and distracting.

this wouldn't have happened if he had the heavy experience with blackboarding required to avoid foresight mistakes like this, but he doesn't, and many people will not

Reply (1)

idiotretardfool

Apr 30Edited

another obvious mistake:

> The numbers I remember from some announcements of Gemini last year were in the hundreds of millions of tokens per second worldwide.

this conflates prefill tokens with decode tokens. the tokens generated is much much lower

i don't believe reiner would be this stupid in a conventional interview. it feels very likely his intelligence is getting misallocated to the work of the blackboard format

the sense that reiner wanted to step ahead and discuss the more interesting bits, instead of continuing to slowly sketch out the shared basics on a board, felt palpable to me in the first 30m

1 reply hidden

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are…