ZDNet on Nostr: With AI models clobbering every benchmark, it's time for human evaluation
Published at
2025-03-29 11:00:13Event JSON
{
"id": "d1d61fbbe35d8ea48b5a26fd2b10011f2566cf6138ce13a990c70f1384a06f06",
"pubkey": "cf0cb3d28fa14be006b73eb339aad0f7b232ee571a272894da7b826b5d7a365c",
"created_at": 1743246013,
"kind": 1,
"tags": [],
"content": "With AI models clobbering every benchmark, it's time for human evaluation\nhttps://www.zdnet.com/article/reasoning-ai-models-are-overwhelming-the-benchmark-tests-its-time-for-human-evaluation/",
"sig": "04fe2cd7d00af79e2124fbc8c0ce18ab889627175990ccfa31ce39af25b3a88c0b60c0ff4b9ce2f7619e86ff1d223aceba1bba2e87b5953bba92650e5f8b6356"
}