AI Models Comparison
Compare popular large language models across providers, pricing, capabilities, and performance.
TL;DR
Comparing GPT-4o, Claude Opus 4, Gemini 2.5 Pro, LLaMA 3.1 405B, Mistral Large, DeepSeek V3, Sonar Pro, Mistral Large 3, GPT-4.1 across 17 features in 5 categories.
Score Breakdown
Full rankings →Weighted: Performance 35% · Value 30% · Reliability 20% · Ease of Use 15%
Scores at a Glance
Best all-rounder. Unmatched ecosystem and ease of use.
Top reasoning quality. Best for complex, high-stakes tasks.
Excellent value. Best choice for Google Workspace teams.
Best open-source model. Free to run, but requires infrastructure.
Strong European alternative with good price and GDPR compliance.
Exceptional value. Strong performance at a fraction of the cost.
Best budget OpenAI model. Near GPT-4o quality at a fraction of the API cost.
← Swipe table left/right to see all columns →
| Feature | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| General | |||||||||
| Provider | OpenAI | Anthropic | Meta | Mistral AI | DeepSeek | Perplexity | Mistral AI | OpenAI | |
| Release Date | May 2024 | May 2025 | Mar 2025 | Jul 2024 | Feb 2024 | Dec 2024 | Feb 2025 | Jul 2025 | Apr 2025 |
| Open Source | |||||||||
| Parameters | Undisclosed | Undisclosed | Undisclosed | 405B | Undisclosed | 671B MoE | Undisclosed | Undisclosed | Undisclosed |
| Context & Tokens | |||||||||
| Max Context Window | 128K | 200K | 1M | 128K | 128K | 128K | 200K | 128K | 1M |
| Max Output Tokens | 16K | 32K | 65K | 4K | 8K | 8K | 8K | 16K | 32K |
| Pricing (per 1M tokens) | |||||||||
| Input Price | $2.50 | $15.00 | $1.25 | Free / Varies | $2.00 | $0.27 | $3.00 | $2.00 | $2.00 |
| Output Price | $10.00 | $75.00 | $10.00 | Free / Varies | $6.00 | $1.10 | $15.00 | $6.00 | $8.00 |
| Capabilities | |||||||||
| Vision (Image Input) | |||||||||
| Function / Tool Calling | |||||||||
| Code Generation | |||||||||
| Structured Output (JSON) | |||||||||
| System Prompts | |||||||||
| Streaming | |||||||||
| Fine-tuning Available | |||||||||
| Benchmarks | |||||||||
| MMLU Score | 88.7% | ~90% | 90.0% | 88.6% | 84.0% | 88.5% | N/A | ~84% | 90.2% |
| HumanEval (Code) | 90.2% | ~93% | 89.0% | 89.0% | 81.0% | 82.6% | N/A | N/A | 92.0% |