Why Nostr? What is Njump?
2025-05-12 20:30:20
in reply to

ynniv on Nostr: Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable ...

Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable because the previous paper tried using less data and found it was necessary:

"Training data from math train 7.5k to Open-Reasoner-
Zero 57k, we observe a consistent increase in both training reward and response length for training and evaluation set, indicating that data scale plays a crucial role in training performance."

Leading to my conclusion that for zero pairs, the previous record was close to 0%. Maybe this isn't strictly true, but I expect it to be more predictive than seeing a 2% change
Author Public Key
npub12akj8hpakgzk6gygf9rzlm343nulpue3pgkx8jmvyeayh86cfrus4x6fdh