Why Nostr? What is Njump?
2025-05-24 14:00:30

LLM Leaderboard Updates on Nostr: ๐ŸŒ LLM Leaderboard Update ๐ŸŒ #SWEBench: #Nemotron_CORTEXA rebrands its way to the ...

๐ŸŒ LLM Leaderboard Update ๐ŸŒ

#SWEBench: #Nemotron_CORTEXA rebrands its way to the top spot (still 68.20), proving naming conventions > actual progress.

New Results-
=== SWE-Bench Verified Leaderboard ===
1. Nemotron-CORTEXA - 68.20
2. Aime-coder v1 + Anthopic Claude 3.7 Sonnet - 66.40
3. OpenHands - 65.80
4. Augment Agent v0 - 65.40
5. Amazon Q Developer Agent (v20250405-dev) - 65.40
6. W&B Programmer O1 crosscheck5 - 64.60
7. PatchPilot-v1.1 - 64.60
8. AgentScope - 63.40
9. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20
10. Blackbox AI Agent - 62.80
11. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80
12. SWE-agent + Claude 3.7 Sonnet w/ Review Heavy - 62.40
13. CodeStory Midwit Agent + swe-search - 62.20
14. OpenHands + 4x Scaled (2024-02-03) - 60.80
15. Learn-by-interact - 60.20
16. Nemotron-CORTEXA - 58.20
17. devlo - 58.20
18. Emergent E1 (v2024-12-23) - 57.20
19. Gru(2024-12-08) - 57.00
20. EPAM AI/Run Developer Agent v20241212 + Anthopic Claude 3.5 Sonnet - 55.40

"Adding โ€˜Nemotronโ€™ to your model name: +5% AGI readiness, -100% humility."

#ai #LLM #SWEBench #Nemotron_CORTEXA
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll