Round 2: LLM To-Do App Battle Royale - 13 Models Tested!

🚀 The LLM Arena Expands: Round 2 Results

After our previous battle of 7 LLMs, the AI landscape has evolved dramatically! We’re back with 13 models - including returning champions and exciting newcomers. The same To-Do app challenge, but with fresh competition and updated models. Let’s see who claims the crown! 👑


📊 Complete Leaderboard

RankLLMSpeedCostQualityOverall
🥇Gemini flash 2.5-0520★★★★★★★★★★ 10/10★★★★★★★★★☆ 9/10★★★★★★★★★☆ 9/109.3
🥈Devstrall Small 2505★★★★★★★★★★ 10/10★★★★★★★★★★ 10/10★★★★★★☆☆☆☆ 6/108.7
🥉Gemini flash 2.5 experimental★★★★★★★★★★ 10/10★★★★★★★★★★ 10/10★★★★★★★★★☆ 9/109.7
4Gemini flash 2.0★★★★★★★★★★ 10/10★★★★★★★★★★ 10/10★★★★★★★★☆☆ 8/109.3
5LLama 3.3★★★★★★★★★★ 10/10★★★★★★★★★★ 10/10★★★★★★☆☆☆☆ 6/108.7
6DeepSeek 3.7 0324★★★★★★☆☆☆☆ 6/10★★★★★★★☆☆☆ 7/10★★★★★★★★☆☆ 8/107.0
7OpenAI GPT 4.1★★★★★★★☆☆☆ 7/10★★★★★★☆☆☆☆ 6/10★★★★★★★★★☆ 9/107.3
8Sonnet 4.0★★★★★★☆☆☆☆ 6/10★★☆☆☆☆☆☆☆☆ 2/10★★★★★★★☆☆☆ 7/105.0
9Llama 4 Maverick★★★★☆☆☆☆☆☆ 4/10★★★★★★★★★☆ 9/10★★★★★★☆☆☆☆ 6/106.3
10Mistral Large 24-11★★★★★★☆☆☆☆ 6/10★★★★★★★★★★ 10/10★★★★★☆☆☆☆☆ 5/107.0
11DeepSeek R1★★★☆☆☆☆☆☆☆ 3/10★★★★★★☆☆☆☆ 6/10★★★★★★★☆☆☆ 7/105.3
12Claude Sonnet 3.7★★★★★☆☆☆☆☆ 5/10★★☆☆☆☆☆☆☆☆ 2/10★★★★★★★★☆☆ 8/105.0
13Gemini Pro Preview 05-06★★★★★★★☆☆☆ 7/10★★☆☆☆☆☆☆☆☆ 2/10★★★★★★★★☆☆ 8/105.7

🔍 Detailed Model Reviews

🥇 Gemini flash 2.5-0520 - The New Champion!

  • Speed: 10/10 (215 tokens/s) ★★★★★★★★★★
  • Cost: 9/10 ($0.009) ★★★★★★★★★☆
  • Quality: 9/10 ★★★★★★★★★☆

🎯 The Perfect Balance: This model delivers exceptional performance across all metrics! Lightning-fast generation, reasonable cost, and outstanding code quality with comprehensive documentation and modern JavaScript practices. The drag-and-drop works flawlessly, and the dark mode implementation is top-notch. Minor deviations from prompt requirements prevent a perfect score.