Why Nostr? What is Njump?
2025-05-02 13:06:40

sudocarlos on Nostr: Shocker, big ai is throwing their dick around > We find that undisclosed private ...

Shocker, big ai is throwing their dick around

> We find that undisclosed
private testing practices benefit a handful of providers who are able to test multiple variants before
public release and retract scores if desired. We establish that the ability of these providers to choose
the best score leads to biased Arena scores due to selective disclosure of performance results. At an
extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release.
We also establish that proprietary closed models are sampled at higher rates (number of battles) and
have fewer models removed from the arena than open-weight and open-source alternatives. Both
these policies lead to large data access asymmetries over time.

https://arxiv.org/pdf/2504.20879
Author Public Key
npub1qdsjkr46urkg6vqrr3zqhgy8l7dazc5k9hlm5jmwqg0vft7hzgtqamgfw3