Ai - Storm developer blog

Round 2: LLM To-Do App Battle Royale - 13 Models Tested!

Published on May 23, 2025

🚀 The LLM Arena Expands: Round 2 Results

After our previous battle of 7 LLMs, the AI landscape has evolved dramatically! We’re back with 13 models - including returning champions and exciting newcomers. The same To-Do app challenge, but with fresh competition and updated models. Let’s see who claims the crown! 👑

📊 Complete Leaderboard

Rank	LLM	Speed	Cost	Quality	Overall
🥇	Gemini flash 2.5-0520	★★★★★★★★★★ 10/10	★★★★★★★★★☆ 9/10	★★★★★★★★★☆ 9/10	9.3
🥈	Devstrall Small 2505	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★☆☆☆☆ 6/10	8.7
🥉	Gemini flash 2.5 experimental	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★★★★☆ 9/10	9.7
4	Gemini flash 2.0	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★★★☆☆ 8/10	9.3
5	LLama 3.3	★★★★★★★★★★ 10/10	★★★★★★★★★★ 10/10	★★★★★★☆☆☆☆ 6/10	8.7
6	DeepSeek 3.7 0324	★★★★★★☆☆☆☆ 6/10	★★★★★★★☆☆☆ 7/10	★★★★★★★★☆☆ 8/10	7.0
7	OpenAI GPT 4.1	★★★★★★★☆☆☆ 7/10	★★★★★★☆☆☆☆ 6/10	★★★★★★★★★☆ 9/10	7.3
8	Sonnet 4.0	★★★★★★☆☆☆☆ 6/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★☆☆☆ 7/10	5.0
9	Llama 4 Maverick	★★★★☆☆☆☆☆☆ 4/10	★★★★★★★★★☆ 9/10	★★★★★★☆☆☆☆ 6/10	6.3
10	Mistral Large 24-11	★★★★★★☆☆☆☆ 6/10	★★★★★★★★★★ 10/10	★★★★★☆☆☆☆☆ 5/10	7.0
11	DeepSeek R1	★★★☆☆☆☆☆☆☆ 3/10	★★★★★★☆☆☆☆ 6/10	★★★★★★★☆☆☆ 7/10	5.3
12	Claude Sonnet 3.7	★★★★★☆☆☆☆☆ 5/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★★☆☆ 8/10	5.0
13	Gemini Pro Preview 05-06	★★★★★★★☆☆☆ 7/10	★★☆☆☆☆☆☆☆☆ 2/10	★★★★★★★★☆☆ 8/10	5.7

🔍 Detailed Model Reviews

🥇 Gemini flash 2.5-0520 - The New Champion!

Speed: 10/10 (215 tokens/s) ★★★★★★★★★★
Cost: 9/10 ($0.009) ★★★★★★★★★☆
Quality: 9/10 ★★★★★★★★★☆

🎯 The Perfect Balance: This model delivers exceptional performance across all metrics! Lightning-fast generation, reasonable cost, and outstanding code quality with comprehensive documentation and modern JavaScript practices. The drag-and-drop works flawlessly, and the dark mode implementation is top-notch. Minor deviations from prompt requirements prevent a perfect score.

Battle of the Bots: To-Do App LLM Reviews & Comparison

Published on April 25, 2025

📝 LLM To-Do App Evaluation Showdown

With the rapid progress in generative AI, many developers turn to Large Language Models (LLMs) for frontend code assistance. But how do these models fare when given the same requirements for a productivity app? We evaluated seven top LLMs on a fixed To-Do app HTML/jQuery/Tailwind challenge, rating them on speed, cost, and quality. ⭐

Below are the results, with links to each implementation, easy-to-read star ratings, and a brief comparative analysis.