Humans learn without explicit goals. The experiences we have are captured moment by moment, sort of the integral of the outcome/goal. There’s a model called “Active Inference” that posits we are continually predicting the next moment while having a goal that minimizes surprise. When surprised, we either update our world model or take an action to close the difference. The more I muddle with this approach, the more I’m convinced it’s a better path than simple next token or whatever.
Internalize the revision process and the judge, not the student and the artifact.
Write the rubric and then the C student report and then the A student report and then the graders review.
Then the second revisions from each student and the second review from the grader.
Don't manufacture answer machines, manufacture revision machines that refine crap into gold. Don't write my paper for me, write the entire history of my paper and it's complete 20 revisions cycle.
My paper is already world class? Great rewrite it without the letter e or in the style of cs lewis or Hemingway. Don't just do it, but tell me how to do it, then do it, then tell me how to further improve it.
People are still so scarcity minded, follow the gradient, don't stop yet.
Humans learn without explicit goals. The experiences we have are captured moment by moment, sort of the integral of the outcome/goal. There’s a model called “Active Inference” that posits we are continually predicting the next moment while having a goal that minimizes surprise. When surprised, we either update our world model or take an action to close the difference. The more I muddle with this approach, the more I’m convinced it’s a better path than simple next token or whatever.
Internalize the revision process and the judge, not the student and the artifact.
Write the rubric and then the C student report and then the A student report and then the graders review.
Then the second revisions from each student and the second review from the grader.
Don't manufacture answer machines, manufacture revision machines that refine crap into gold. Don't write my paper for me, write the entire history of my paper and it's complete 20 revisions cycle.
My paper is already world class? Great rewrite it without the letter e or in the style of cs lewis or Hemingway. Don't just do it, but tell me how to do it, then do it, then tell me how to further improve it.
People are still so scarcity minded, follow the gradient, don't stop yet.
RL learns from process rewards also and not just from binary out come rewards is it not?
Dwarkesh can you do some more work on AI safety please, people still aren’t recognising that we might all be dead within 30 years