Why Nostr? What is Njump?
2024-08-08 18:35:08

Assaf 🥥🌴 on Nostr: Reinforcement Learning from Human Feedback (RLHF) is just a vibe check "You'd train ...

Reinforcement Learning from Human Feedback (RLHF) is just a vibe check

"You'd train it to agree with the human judgement on average. Once we have a Reward Model vibe check, you run RL with respect to it, learning to play the moves that lead to good vibes. Clearly, this would not have led anywhere too interesting in Go.”

https://x.com/karpathy/status/1821277264996352246

Author Public Key
npub1x63s0q69wcpvzuktgpxh02679x0skt6gdjnregct8f9lqmjuq3rsdhtc9e