someone on Nostr: RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different ...
RLNF: Reinforcement Learning from Nostr Feedback
We ask a question to two different LLMs.
We let nostriches vote which answer is better.
We reuse the feedback in further fine tuning the LLM.
We zap the nostriches.
AI gets super wise.
Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.
Thoughts?
Published at
2025-02-22 19:23:47Event JSON
{
"id": "7634c6b0972a1e309b46d951822412ef84057cf0895189c38d2280cf1cca83e1",
"pubkey": "9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1",
"created_at": 1740252227,
"kind": 1,
"tags": [],
"content": "RLNF: Reinforcement Learning from Nostr Feedback\nWe ask a question to two different LLMs.\nWe let nostriches vote which answer is better.\nWe reuse the feedback in further fine tuning the LLM.\nWe zap the nostriches.\nAI gets super wise.\nEvery AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.\nThoughts?",
"sig": "e3a8a23408f51a2d94648e8bb70634959a2fd987dbb92c909afffc26b843bd60410ef5e33c54da42ad24b0b7ee8afd0d5de0b41b2b53f8e8c3c3561744b7e2ba"
}