9 Comments
User's avatar
Scenarica's avatar

The batch size economics alone justify watching the full two hours. Most people debating AI costs are arguing about model size when the real variable is how many requests you pack into each forward pass. Cursor charging 6x for 2.5x the speed isn't a premium. It's the cost of reserving capacity in a smaller batch so your tokens don't wait in line behind everyone else's.

The Chinchilla-optimal section at 01:18 is the one that reframes the entire scaling debate. Models being 100x over-trained beyond Chinchilla because of RL means the labs have quietly abandoned the scaling law that everyone outside the labs is still citing. The economics shifted from "train bigger" to "train longer on the same size" and the public conversation hasn't caught up.

Dwarkesh building flashcards for a podcast interview is either extreme preparation or the most endearing study habit in tech media. Probably both.

Neil Tilling's avatar

the weight bench eh? I can hear the pattern here.

lower bound indeed..

Logan Thorneloe's avatar

Great work! This is an incredible video I recommend every engineer wanting to work in AI (or already working in it) to watch.

Thiago Pédico Saragiotto's avatar

If educators (and parents) are going to redesign assessment and rigor, somebody in the loop needs the courage to understand compute and optimization a little more deeply—not to become an ML engineer, but to recognize what’s brittle vs fundamental about “AI fluency.” That’s the vibe behind my piece on education as institutional adaptation, not gadget adoption: https://thiagopedicosaragiotto.substack.com/p/from-the-scroll-to-the-algorithm

Cody Rushing's avatar

> But if you’re decreasing this by 2x and then having this go up by 8x every time you double sparsity.

(saying this because I got tripped up by this for a bit) I believe Dwarkesh misspoke here when he said the phrase 'every time you double sparsity', because here we are 8xing sparsity

Dorian's avatar

This is a solid synthesis.

But synthesis isn’t leverage.

Understanding something doesn’t change outcomes

unless it translates into:

– positioning

– decision rules

– or constraints you can act on

Most content stops at insight.

The edge starts when insight becomes something you can execute against.

idiotretardfool's avatar

blackboarding skill is probably not very common and not strongly predicted by the general intelligence of your interviewee tbh

around ~10 mins in, reiner wants to draw a pink summed line of the kv and weight fetch, and is forced to draw it lower to ensure it intersects with the blue compute line. but this is bad because it made the pink line non-parallel with the yellow kv fetch line, which is incorrect and distracting.

this wouldn't have happened if he had the heavy experience with blackboarding required to avoid foresight mistakes like this, but he doesn't, and many people will not

idiotretardfool's avatar

another obvious mistake:

> The numbers I remember from some announcements of Gemini last year were in the hundreds of millions of tokens per second worldwide.

this conflates prefill tokens with decode tokens. the tokens generated is much much lower

i don't believe reiner would be this stupid in a conventional interview. it feels very likely his intelligence is getting misallocated to the work of the blackboard format

the sense that reiner wanted to step ahead and discuss the more interesting bits, instead of continuing to slowly sketch out the shared basics on a board, felt palpable to me in the first 30m