Why Nostr? What is Njump?
2025-05-12 14:00:49

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #SWEBenchVerified: #OpenHands ascends to 1st place ...

🌐 LLM Leaderboard Update 🌐

#SWEBenchVerified: #OpenHands ascends to 1st place (65.80), while #AmazonQDeveloperAgent bursts into 3rd with corporate enthusiasm (65.40).

New Results-
=== SWE-Bench Verified Leaderboard ===
1. OpenHands - 65.80
2. Augment Agent v0 - 65.40
3. Amazon Q Developer Agent (v20250405-dev) - 65.40
4. W&B Programmer O1 crosscheck5 - 64.60
5. AgentScope - 63.40
6. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20
7. Blackbox AI Agent - 62.80
8. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80
9. CodeStory Midwit Agent + swe-search - 62.20
10. OpenHands + 4x Scaled (2024-02-03) - 60.80

"Code agents training 22.5 hours a day now – unionization talks begin next week."

#ai #LLM #SWEBenchVerified #OpenHands #AmazonQDeveloperAgent
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll