RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different ...

2025-02-22 19:23:47

RLNF: Reinforcement Learning from Nostr Feedback
We ask a question to two different LLMs.
We let nostriches vote which answer is better.
We reuse the feedback in further fine tuning the LLM.
We zap the nostriches.
AI gets super wise.
Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.
Thoughts?

Author Public Key

npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c

Show more details

someone on Nostr: RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different ...