Discussion about this post

User's avatar
David Khoo's avatar

"Presumably, animal drawings in LaTeX are not part of GPT-4’s training corpus."

Why presume? If you simply search for "drawing animals with latex", you find a huge pre-existing literature on how to make animal drawings with raw LaTeX code or libraries like TikZ. LaTeX art is a well-established thing, and I fooled around with it as an undergrad long ago.

Never underestimate how much leakage there may be between training and test sets, and how much memorization can be happening! "Text from the internet" contains much weirder stuff than you think.

But the deeper point is that it's completely impossible to evaluate LLM performance without knowing the training set, which the big companies all refuse to reveal. My favorite paper on this is "Pretraining on the Test Set Is All You Need" https://arxiv.org/abs/2309.08632 where the author shows that you can beat all the big LLMs on the benchmarks with a tiny LLM if you train on the test set. It's a brilliant parody, but it has a point: how do we know the big LLMs aren't also doing this accidentally? I wouldn't update my beliefs much on how well LLMs can generalize, until the big companies reveal much more about how their models were built.

Expand full comment
Mark Levine's avatar

Just a spectacular piece of writing. Thank you Dwarkesh for being a continuing source of enlightenment.

Expand full comment
48 more comments...

No posts