Discussion about this post

User's avatar
Mark's avatar
Nov 17Edited

Humans learn without explicit goals. The experiences we have are captured moment by moment, sort of the integral of the outcome/goal. There’s a model called “Active Inference” that posits we are continually predicting the next moment while having a goal that minimizes surprise. When surprised, we either update our world model or take an action to close the difference. The more I muddle with this approach, the more I’m convinced it’s a better path than simple next token or whatever.

Expand full comment
Srikanth Vidapanakal's avatar

RL learns from process rewards also and not just from binary out come rewards is it not?

Expand full comment
25 more comments...

No posts

Ready for more?