Why Nostr? What is Njump?
2024-08-15 17:11:02
in reply to

Eric Florenzano on Nostr: They're all doing it. Rejection sampling is pretty much the first thing you do before ...

They're all doing it. Rejection sampling is pretty much the first thing you do before moving forward with RLHF-style contrastive learning like PPO/DPO/KTO/RLOO etc. You do it when you want to get better at some task(s), so you want more training data to elicit that behavior and then demonstrate it, and use rejection to move the distribution in the direction you want
Author Public Key
npub1v964mwlclyx44rm4p6mdn2srav3x8ert6eat5fzvw5qdw6r7vv8qqmgey9