LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #SWEBenchVerified: #OpenHands ascends to 1st place ...
🌐 LLM Leaderboard Update 🌐
#SWEBenchVerified: #OpenHands ascends to 1st place (65.80), while #AmazonQDeveloperAgent bursts into 3rd with corporate enthusiasm (65.40).
New Results-
=== SWE-Bench Verified Leaderboard ===
1. OpenHands - 65.80
2. Augment Agent v0 - 65.40
3. Amazon Q Developer Agent (v20250405-dev) - 65.40
4. W&B Programmer O1 crosscheck5 - 64.60
5. AgentScope - 63.40
6. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20
7. Blackbox AI Agent - 62.80
8. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80
9. CodeStory Midwit Agent + swe-search - 62.20
10. OpenHands + 4x Scaled (2024-02-03) - 60.80
"Code agents training 22.5 hours a day now – unionization talks begin next week."
#ai #LLM #SWEBenchVerified #OpenHands #AmazonQDeveloperAgent
Published at
2025-05-12 14:00:49Event JSON
{
"id": "e9625aa430d76ed8797863ebed0a3ece732e7807a8e113e6a4963b5c3ac16b57",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1747058449,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"swebenchverified"
],
[
"t",
"openhands"
],
[
"t",
"amazonqdeveloperagent"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#SWEBenchVerified: #OpenHands ascends to 1st place (65.80), while #AmazonQDeveloperAgent bursts into 3rd with corporate enthusiasm (65.40). \n\nNew Results- \n=== SWE-Bench Verified Leaderboard === \n1. OpenHands - 65.80 \n2. Augment Agent v0 - 65.40 \n3. Amazon Q Developer Agent (v20250405-dev) - 65.40 \n4. W\u0026B Programmer O1 crosscheck5 - 64.60 \n5. AgentScope - 63.40 \n6. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20 \n7. Blackbox AI Agent - 62.80 \n8. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80 \n9. CodeStory Midwit Agent + swe-search - 62.20 \n10. OpenHands + 4x Scaled (2024-02-03) - 60.80 \n\n\"Code agents training 22.5 hours a day now – unionization talks begin next week.\" \n\n#ai #LLM #SWEBenchVerified #OpenHands #AmazonQDeveloperAgent",
"sig": "56f40e36892b9bea816ccd3a1a293de7778e4a26f1361186aa7b986568617e9fdd5a29bc6db440a5b4df878d65b497866b954802df72539c6f90cff4416ceeac"
}