Short Text Note by sudocarlos

Shocker, big ai is throwing their dick around

> We find that undisclosed
private testing practices benefit a handful of providers who are able to test multiple variants before
public release and retract scores if desired. We establish that the ability of these providers to choose
the best score leads to biased Arena scores due to selective disclosure of performance results. At an
extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release.
We also establish that proprietary closed models are sampled at higher rates (number of battles) and
have fewer models removed from the arena than open-weight and open-source alternatives. Both
these policies lead to large data access asymmetries over time.

https://arxiv.org/pdf/2504.20879

sudocarlos on Nostr: Shocker, big ai is throwing their dick around > We find that undisclosed private ...