S!ayer on Nostr: not from LLM based models, no. RLVR models are the new method, reinforcement learning ...
not from LLM based models, no.
RLVR models are the new method, reinforcement learning with verifiable rewards - but then also zero data/zero knowledge based learning.
In other words, they have AI teach AI, become self aware. Reinforced self-play reasoning with zero data. So basically it starts as an SI, iterates, teaches itself based on it's own inputs/outputs, iterates again all without any human inputs (data or prompts instruction)
This new method allows for verified rewards to be the tool that defines the ai reasoning model
Published at
2025-05-11 21:43:01Event JSON
{
"id": "4aa7e26cf988ab80a39f8c6ec746ad6d7733ca50e239656cc2682b6a4d01c56c",
"pubkey": "cdecc31c6d9406e9a7d6b0067412aa661d9d31c8035c3fd65c06301d1cac3b92",
"created_at": 1746999781,
"kind": 1,
"tags": [
[
"e",
"29199c8c87eedfe0499737de9487302d55c6ec8ed832b23c85eaa3092925bd09",
"",
"root"
],
[
"e",
"340cc8c800c42050f0532f772a332f2229d5a0e56e113245bf8bb35d18b8139c",
"wss://offchain.pub/",
"reply",
"16f1a0100d4cfffbcc4230e8e0e4290cc5849c1adc64d6653fda07c031b1074b"
],
[
"p",
"16f1a0100d4cfffbcc4230e8e0e4290cc5849c1adc64d6653fda07c031b1074b"
],
[
"p",
"cdecc31c6d9406e9a7d6b0067412aa661d9d31c8035c3fd65c06301d1cac3b92"
]
],
"content": "not from LLM based models, no. \n\nRLVR models are the new method, reinforcement learning with verifiable rewards - but then also zero data/zero knowledge based learning. \n\nIn other words, they have AI teach AI, become self aware. Reinforced self-play reasoning with zero data. So basically it starts as an SI, iterates, teaches itself based on it's own inputs/outputs, iterates again all without any human inputs (data or prompts instruction)\n\nThis new method allows for verified rewards to be the tool that defines the ai reasoning model ",
"sig": "efdc19ff87e7f334bf32ceefa5d55825d49d64926b31e9048792fc91d7f29975fb35181e9462712156cd3d21295f18adb4d2bd2d38207577f359c5c9d79f159a"
}