Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable ...

2025-05-12 20:30:20

Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable because the previous paper tried using less data and found it was necessary:

"Training data from math train 7.5k to Open-Reasoner-
Zero 57k, we observe a consistent increase in both training reward and response length for training and evaluation set, indicating that data scale plays a crucial role in training performance."

Leading to my conclusion that for zero pairs, the previous record was close to 0%. Maybe this isn't strictly true, but I expect it to be more predictive than seeing a 2% change

Author Public Key

npub12akj8hpakgzk6gygf9rzlm343nulpue3pgkx8jmvyeayh86cfrus4x6fdh

Seen on

wss://relay.primal.net wss://nos.lol wss://relay.damus.io

Show more details

Published at

2025-05-12 20:30:20

Kind type

1 Short Text Note

Event JSON

{ "id": "d95c381db5fff384336216eb9b12c4fa09a08d67e69ea760ec4cdf1d35750d91", "pubkey": "576d23dc3db2056d208849462fee358cf9f0f3310a2c63cb6c267a4b9f5848f9", "created_at": 1747081820, "kind": 1, "tags": [ [ "e", "23182d6a78752d9428cd77052fe666b28ac3d14b06497823d2781411c0ae5996", "wss://feeds.nostr.band/bens", "root" ], [ "e", "48dcc3c499f926ecb9a402ab923be2fad70ce3cc23d70351f1497038c183df83", "wss://feeds.nostr.band/bens", "reply" ], [ "p", "576d23dc3db2056d208849462fee358cf9f0f3310a2c63cb6c267a4b9f5848f9", "", "mention" ], [ "p", "82341f882b6eabcd2ba7f1ef90aad961cf074af15b9ef44a09f9d2a8fbfbe6a2", "", "mention" ], [ "p", "da18e9860040f3bf493876fc16b1a912ae5a6f6fa8d5159c3de2b8233a0d9851" ], [ "r", "wss://nostr.wine/" ], [ "r", "wss://nos.lol/" ], [ "r", "wss://nostr.land/" ], [ "r", "wss://relay.getalby.com/v1" ], [ "r", "wss://relay.mostr.pub/" ], [ "r", "wss://relay.snort.social/" ], [ "r", "wss://relay.damus.io/" ], [ "r", "wss://theforest.nostr1.com/" ], [ "r", "wss://relay.primal.net/" ] ], "content": "Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable because the previous paper tried using less data and found it was necessary:\n\n\"Training data from math train 7.5k to Open-Reasoner-\nZero 57k, we observe a consistent increase in both training reward and response length for training and evaluation set, indicating that data scale plays a crucial role in training performance.\"\n\nLeading to my conclusion that for zero pairs, the previous record was close to 0%. Maybe this isn't strictly true, but I expect it to be more predictive than seeing a 2% change", "sig": "8cc0d780ed979d904a44879728bd24eb8510d4710c0e7a36fd01c28ba02fc21f6819de213b96a5e0a9d99adb284d438d1aab930ab2e531088003dd1ba52e057a" }

ynniv on Nostr: Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable ...