Discussion about this post

User's avatar
Alec Pritzos's avatar

The decades-long verification loop is the load-bearing argument here. The cleaner version is that RLVR fails at science not because science is slow but because the reward function is what gets argued over in real research. Copernicus losing to Ptolemy on accuracy in 1543 is the same shape as a modern theory-selection problem: better theories often make worse short-run predictions, and any verifier trained on existing data would have penalized the right answer. RL needs a fixed objective; science is the discipline of editing the objective.

Paul Meccano's avatar

Nice to feeeeel your way, huh?

4 more comments...

No posts

Ready for more?