Round2

Round 2: LLM To-Do App Battle Royale - 13 Models Tested!

Published on May 23, 2025

🚀 The LLM Arena Expands: Round 2 Results

After our previous battle of 7 LLMs, the AI landscape has evolved dramatically! We’re back with 13 models - including returning champions and exciting newcomers. The same To-Do app challenge, but with fresh competition and updated models. Let’s see who claims the crown! 👑

📊 Complete Leaderboard

Rank	LLM	Speed	Cost	Quality	Overall
🥇	Gemini flash 2.5-0520	★★★★★★★★★★ 10/10	★★★★★★★★★☆ 9/10	★★★★★★★★★☆ 9/10	9.3
🥈	Devstrall Small 2505	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★☆☆☆☆ 6/10	8.7
🥉	Gemini flash 2.5 experimental	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★★★★☆ 9/10	9.7
4	Gemini flash 2.0	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★★★☆☆ 8/10	9.3
5	LLama 3.3	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★☆☆☆☆ 6/10	8.7
6	DeepSeek 3.7 0324	★★★★★★☆☆☆☆ 6/10	★★★★★★★☆☆☆ 7/10	★★★★★★★★☆☆ 8/10	7.0
7	OpenAI GPT 4.1	★★★★★★★☆☆☆ 7/10	★★★★★★☆☆☆☆ 6/10	★★★★★★★★★☆ 9/10	7.3
8	Sonnet 4.0	★★★★★★☆☆☆☆ 6/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★☆☆☆ 7/10	5.0
9	Llama 4 Maverick	★★★★☆☆☆☆☆☆ 4/10	★★★★★★★★★☆ 9/10	★★★★★★☆☆☆☆ 6/10	6.3
10	Mistral Large 24-11	★★★★★★☆☆☆☆ 6/10	★★★★★★★★★★ 10/10	★★★★★☆☆☆☆☆ 5/10	7.0
11	DeepSeek R1	★★★☆☆☆☆☆☆☆ 3/10	★★★★★★☆☆☆☆ 6/10	★★★★★★★☆☆☆ 7/10	5.3
12	Claude Sonnet 3.7	★★★★★☆☆☆☆☆ 5/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★★☆☆ 8/10	5.0
13	Gemini Pro Preview 05-06	★★★★★★★☆☆☆ 7/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★★☆☆ 8/10	5.7

🔍 Detailed Model Reviews

🥇 Gemini flash 2.5-0520 - The New Champion!

Speed: 10/10 (215 tokens/s) ★★★★★★★★★★
Cost: 9/10 ($0.009) ★★★★★★★★★☆
Quality: 9/10 ★★★★★★★★★☆

🎯 The Perfect Balance: This model delivers exceptional performance across all metrics! Lightning-fast generation, reasonable cost, and outstanding code quality with comprehensive documentation and modern JavaScript practices. The drag-and-drop works flawlessly, and the dark mode implementation is top-notch. Minor deviations from prompt requirements prevent a perfect score.