They're all doing it. Rejection sampling is pretty much the first thing you do before ...

2024-08-15 17:11:02

They're all doing it. Rejection sampling is pretty much the first thing you do before moving forward with RLHF-style contrastive learning like PPO/DPO/KTO/RLOO etc. You do it when you want to get better at some task(s), so you want more training data to elicit that behavior and then demonstrate it, and use rejection to move the distribution in the direction you want

Author Public Key

npub1v964mwlclyx44rm4p6mdn2srav3x8ert6eat5fzvw5qdw6r7vv8qqmgey9

Show more details

Eric Florenzano on Nostr: They're all doing it. Rejection sampling is pretty much the first thing you do before ...