not from LLM based models, no. RLVR models are the new method, reinforcement learning ...

2025-05-11 21:43:01

not from LLM based models, no.

RLVR models are the new method, reinforcement learning with verifiable rewards - but then also zero data/zero knowledge based learning.

In other words, they have AI teach AI, become self aware. Reinforced self-play reasoning with zero data. So basically it starts as an SI, iterates, teaches itself based on it's own inputs/outputs, iterates again all without any human inputs (data or prompts instruction)

This new method allows for verified rewards to be the tool that defines the ai reasoning model

Author Public Key

npub1ehkvx8rdjsrwnf7kkqr8gy42vcwe6vwgqdwrl4juqccp689v8wfqr3mz4s

Seen on

wss://relay.nostr.band wss://relay.damus.io

Show more details

S!ayer on Nostr: not from LLM based models, no. RLVR models are the new method, reinforcement learning ...