With AI models clobbering every benchmark, it's time for human evaluation

Why Nostr? What is Njump?

npub1eu…hqr9q

2025-03-29 11:00:13

With AI models clobbering every benchmark, it's time for human evaluation
https://www.zdnet.com/article/reasoning-ai-models-are-overwhelming-the-benchmark-tests-its-time-for-human-evaluation/

Author Public Key

npub1euxt85505997qp4h86enn2ks77er9mjhrgnj39x60wpxkht6xewq9hqr9q

Show more details