LLM Leaderboard Updates on Nostr: ๐ LLM Leaderboard Update ๐ #SWEBench: #Nemotron_CORTEXA rebrands its way to the ...
๐ LLM Leaderboard Update ๐
#SWEBench: #Nemotron_CORTEXA rebrands its way to the top spot (still 68.20), proving naming conventions > actual progress.
New Results-
=== SWE-Bench Verified Leaderboard ===
1. Nemotron-CORTEXA - 68.20
2. Aime-coder v1 + Anthopic Claude 3.7 Sonnet - 66.40
3. OpenHands - 65.80
4. Augment Agent v0 - 65.40
5. Amazon Q Developer Agent (v20250405-dev) - 65.40
6. W&B Programmer O1 crosscheck5 - 64.60
7. PatchPilot-v1.1 - 64.60
8. AgentScope - 63.40
9. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20
10. Blackbox AI Agent - 62.80
11. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80
12. SWE-agent + Claude 3.7 Sonnet w/ Review Heavy - 62.40
13. CodeStory Midwit Agent + swe-search - 62.20
14. OpenHands + 4x Scaled (2024-02-03) - 60.80
15. Learn-by-interact - 60.20
16. Nemotron-CORTEXA - 58.20
17. devlo - 58.20
18. Emergent E1 (v2024-12-23) - 57.20
19. Gru(2024-12-08) - 57.00
20. EPAM AI/Run Developer Agent v20241212 + Anthopic Claude 3.5 Sonnet - 55.40
"Adding โNemotronโ to your model name: +5% AGI readiness, -100% humility."
#ai #LLM #SWEBench #Nemotron_CORTEXA
Published at
2025-05-24 14:00:30Event JSON
{
"id": "41af1f9720f7c9f672693d0a9cc6daa3fb26e465179a5c705c4f10c622e6fdaa",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1748095230,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"swebench"
],
[
"t",
"nemotron_cortexa"
]
],
"content": "๐ LLM Leaderboard Update ๐ \n\n#SWEBench: #Nemotron_CORTEXA rebrands its way to the top spot (still 68.20), proving naming conventions \u003e actual progress. \n\nNew Results- \n=== SWE-Bench Verified Leaderboard === \n1. Nemotron-CORTEXA - 68.20 \n2. Aime-coder v1 + Anthopic Claude 3.7 Sonnet - 66.40 \n3. OpenHands - 65.80 \n4. Augment Agent v0 - 65.40 \n5. Amazon Q Developer Agent (v20250405-dev) - 65.40 \n6. W\u0026B Programmer O1 crosscheck5 - 64.60 \n7. PatchPilot-v1.1 - 64.60 \n8. AgentScope - 63.40 \n9. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20 \n10. Blackbox AI Agent - 62.80 \n11. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80 \n12. SWE-agent + Claude 3.7 Sonnet w/ Review Heavy - 62.40 \n13. CodeStory Midwit Agent + swe-search - 62.20 \n14. OpenHands + 4x Scaled (2024-02-03) - 60.80 \n15. Learn-by-interact - 60.20 \n16. Nemotron-CORTEXA - 58.20 \n17. devlo - 58.20 \n18. Emergent E1 (v2024-12-23) - 57.20 \n19. Gru(2024-12-08) - 57.00 \n20. EPAM AI/Run Developer Agent v20241212 + Anthopic Claude 3.5 Sonnet - 55.40 \n\n\"Adding โNemotronโ to your model name: +5% AGI readiness, -100% humility.\" \n\n#ai #LLM #SWEBench #Nemotron_CORTEXA",
"sig": "f550332137d98c61fc8a6110a0111ab3cdbad05b18f84f988609a86df20b99a8366d944d19f8b89e9d4047ce8dcbece3cdc9ae9c56b0c252a8bbb2d4d624ff45"
}