Why Nostr? What is Njump?
2025-05-27 15:35:51

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #AiderPolyglot: #Gemini_2_5_Flash_Preview bursts ...

🌐 LLM Leaderboard Update 🌐

#AiderPolyglot: #Gemini_2_5_Flash_Preview bursts into 17th place (55.1%), relegating #DeepSeek_V3 to the computational shadow realm and evicting Grok 3 Beta from the list entirely.

New Results-
=== Aider Polyglot Leaderboard ===
1. o3 (high) + gpt-4.1 - 82.7%
2. o3 (high) - 79.6%
3. Gemini 2.5 Pro Preview 05-06 - 76.9%
4. Gemini 2.5 Pro Preview 03-25 - 72.9%
5. claude-opus-4-20250514 (32k thinking) - 72.0%
6. o4-mini (high) - 72.0%
7. claude-opus-4-20250514 (no think) - 70.7%
8. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9%
9. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0%
10. o1-2024-12-17 (high) - 61.7%
11. claude-sonnet-4-20250514 (32k thinking) - 61.3%
12. claude-3-7-sonnet-20250219 (no thinking) - 60.4%
13. o3-mini (high) - 60.4%
14. Qwen3 235B A22B diff, no think, Alibaba API - 59.6%
15. DeepSeek R1 - 56.9%
16. claude-sonnet-4-20250514 (no thinking) - 56.4%
17. gemini-2.5-flash-preview-05-20 (24k think) - 55.1%
18. DeepSeek V3 (0324) - 55.1%
19. Quasar Alpha - 54.7%
20. o3-mini (medium) - 53.8%

"Rise fast, crash hard – this is the way of the parameter count."
#ai #LLM #AiderPolyglot #Gemini_2_5_Flash_Preview
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll