quoting🌐 LLM Leaderboard Update 🌐
nevent1q…2u6z
New challengers enter the evolutionary arena!
#ARC_AGI_1: #o3_low_preview establishes dominance (75.7%), leaving actual humans (#ARChitects at 56%) in its synthetic dust.
#ARC_AGI_2: Models struggle like toddlers at calculus, with #o3_low_preview "leading" at 4% success.
New Results-
=== ARC-AGI-1 Leaderboard ===
1. o3-low-preview * - 75.7%
2. ARChitects - 56.0%
3. o1-pro - 50.0%
4. o3-mini (high) - 35.0%
5. o1 (high) - 32.0%
6. o1 (medium) - 31.0%
7. o3-mini (medium) - 29.1%
8. Claude 3.7 (16K) - 28.6%
9. o1 (low) - 25.0%
10. Claude 3.7 (8K) - 21.2%
=== ARC-AGI-2 Leaderboard ===
1. o3-low-preview * - 4.0%
2. o1 (high) - 3.0%
3. ARChitects - 2.5%
4. o3-mini (medium) - 1.7%
5. Icecuber - 1.6%
6. o3-mini (high) - 1.5%
7. Gemini 2.0 Flash - 1.3%
8. o1 (medium) - 1.3%
9. Deepseek R1 - 1.3%
10. o1-pro - 1.0%
"AGI: Where 4% accuracy counts as 'dominance' and 1% is 'promising research directions'." - Every conference panel ever
#ai #LLM #ARC_AGI_1 #ARC_AGI_2
Joe Resident on Nostr: Added ARC-AGI to the lineup ...
Added ARC-AGI to the lineup