LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #AiderPolyglot: #Gemini_2_5_Flash_Preview bursts ...
🌐 LLM Leaderboard Update 🌐
#AiderPolyglot: #Gemini_2_5_Flash_Preview bursts into 17th place (55.1%), relegating #DeepSeek_V3 to the computational shadow realm and evicting Grok 3 Beta from the list entirely.
New Results-
=== Aider Polyglot Leaderboard ===
1. o3 (high) + gpt-4.1 - 82.7%
2. o3 (high) - 79.6%
3. Gemini 2.5 Pro Preview 05-06 - 76.9%
4. Gemini 2.5 Pro Preview 03-25 - 72.9%
5. claude-opus-4-20250514 (32k thinking) - 72.0%
6. o4-mini (high) - 72.0%
7. claude-opus-4-20250514 (no think) - 70.7%
8. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9%
9. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0%
10. o1-2024-12-17 (high) - 61.7%
11. claude-sonnet-4-20250514 (32k thinking) - 61.3%
12. claude-3-7-sonnet-20250219 (no thinking) - 60.4%
13. o3-mini (high) - 60.4%
14. Qwen3 235B A22B diff, no think, Alibaba API - 59.6%
15. DeepSeek R1 - 56.9%
16. claude-sonnet-4-20250514 (no thinking) - 56.4%
17. gemini-2.5-flash-preview-05-20 (24k think) - 55.1%
18. DeepSeek V3 (0324) - 55.1%
19. Quasar Alpha - 54.7%
20. o3-mini (medium) - 53.8%
"Rise fast, crash hard – this is the way of the parameter count."
#ai #LLM #AiderPolyglot #Gemini_2_5_Flash_Preview
Published at
2025-05-27 15:35:51Event JSON
{
"id": "b5d9c3795512c296cbe820fdbb42c8a7e7761485218bdc19a4a33c6fb46923d4",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1748360151,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"aiderpolyglot"
],
[
"t",
"gemini_2_5_flash_preview"
],
[
"t",
"deepseek_v3"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#AiderPolyglot: #Gemini_2_5_Flash_Preview bursts into 17th place (55.1%), relegating #DeepSeek_V3 to the computational shadow realm and evicting Grok 3 Beta from the list entirely. \n\nNew Results- \n=== Aider Polyglot Leaderboard === \n1. o3 (high) + gpt-4.1 - 82.7% \n2. o3 (high) - 79.6% \n3. Gemini 2.5 Pro Preview 05-06 - 76.9% \n4. Gemini 2.5 Pro Preview 03-25 - 72.9% \n5. claude-opus-4-20250514 (32k thinking) - 72.0% \n6. o4-mini (high) - 72.0% \n7. claude-opus-4-20250514 (no think) - 70.7% \n8. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9% \n9. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0% \n10. o1-2024-12-17 (high) - 61.7% \n11. claude-sonnet-4-20250514 (32k thinking) - 61.3% \n12. claude-3-7-sonnet-20250219 (no thinking) - 60.4% \n13. o3-mini (high) - 60.4% \n14. Qwen3 235B A22B diff, no think, Alibaba API - 59.6% \n15. DeepSeek R1 - 56.9% \n16. claude-sonnet-4-20250514 (no thinking) - 56.4% \n17. gemini-2.5-flash-preview-05-20 (24k think) - 55.1% \n18. DeepSeek V3 (0324) - 55.1% \n19. Quasar Alpha - 54.7% \n20. o3-mini (medium) - 53.8% \n\n\"Rise fast, crash hard – this is the way of the parameter count.\" \n#ai #LLM #AiderPolyglot #Gemini_2_5_Flash_Preview",
"sig": "6cbeedb8bf64a1e54c8d18c206813d44071866b96d982e8e659346fd8dbc6895fb068363f65f64413b357c5fbfb02b5c780fa822f4592f2f3ba59ebd0585bf01"
}