Rankings • 2025

The Top 49 LLMs of 2025 — Part 1: Countdown 26–49

We tested dozens of AI models under one clear system. This chapter shares ranks 26–49 with context size, score, price, speed, and where each one fits.

Artificial Analysis Intelligence Index v2.2 Updated: Read time: ~9 min

Why This List Matters

In 2025, there are hundreds of AI language models. Many claim to be the best, which makes real choices harder. This series cuts through noise with one test setup so you can compare models fairly.

We split the results into three parts to make decisions easier:

This page covers Part 1: ranks 26–49. They won’t always make headlines, but they keep products moving.

How the Scoring Works

We used the Artificial Analysis Intelligence Index v2.2 to score each model out of 100. Weighting:

We also consider real‑world signals: price, speed, context window, and reliability. That shows not just knowledge but day‑to‑day usefulness.

Quick Picks

Cheap but powerful

#34 Gemini 2.5 Flash • #37 Gemini 2.0 Flash • #31 GPT‑4o (Mar ’25)

Huge memory

#42 Gemini 1.5 Pro (2M tokens) • #45 Llama 4 Scout (10M tokens)

Open source

#40 DeepSeek V3 (Dec ’24) • #30 Llama 4 Maverick

Great at live search

#44 Perplexity Sonar • #46 Sonar Pro • #29 Grok 3

Countdown: Models 26–49

Rank26 • OpenAI

OpenAI GPT‑4.1

Context: 1M Score: 52 $3.50 / 1M 127.8 t/s

Balanced model with strong multimodal skills.

Best for: creative writing, business tasks, and analysis with big memory.

27 • Google

Gemini 2.0 Flash Thinking (Experimental)

Context: 1MScore: 52Free195.4 t/s

Early “thinking” features offered free for research.

Best for: testing new reasoning tools without cost.

28 • DeepSeek

DeepSeek R1 Distill Qwen 32B

Context: 128kScore: 51$0.22 / 1M21.2 t/s

Low‑cost reasoning with solid math skills.

Best for: education tools and research helpers.

29 • xAI

Grok 3

Context: 1MScore: 51$6.00 / 1M49.4 t/s

Real‑time information and trend awareness.

Best for: media, marketing, and social apps.

30 • Meta

Llama 4 Maverick

Context: 1MScore: 51$0.35 / 1M121.1 t/s

Open‑source model with robust fundamentals and community support.

Best for: custom projects and OSS ecosystems.

31 • OpenAI

GPT‑4o (March ’25)

Context: 128kScore: 49$7.50 / 1M164.2 t/s

Reliable multimodal model with strong latency and quality.

Best for: apps where speed and vision/voice matter.

32 • Google

Gemini 2.0 Pro (Experimental)

Context: 2MScore: 48Free35.1 t/s

Research access to a massive 2M‑token context window.

Best for: research and legal document testing.

33 • DeepSeek

DeepSeek R1 Distill Qwen 14B

Context: 128kScore: 48$0.88 / 1M59.4 t/s

Strong reasoning for its size and price.

Best for: learning tools, analysis bots, and support systems.

34 • Google

Gemini 2.5 Flash

Context: 1MScore: 48$0.26 / 1M293.3 t/s

Blazing speed and excellent cost efficiency.

Best for: bulk chat, moderation, and high‑volume workflows.

35 • DeepSeek

DeepSeek R1 Distill Llama 70B

Context: 128kScore: 47$0.60 / 1M106.7 t/s

Distilled from a large model; strong at coding tasks.

Best for: developer tools and technical documentation.

36 • Anthropic

Claude 3.7 Sonnet

Context: 200kScore: 47$6.00 / 1M77.0 t/s

Safe, polished long‑form writing and analysis.

Best for: reports, policy docs, and careful writing.

37 • Google

Gemini 2.0 Flash

Context: 1MScore: 47$0.17 / 1M229.1 t/s

Fast and budget‑friendly for real‑time apps.

Best for: startups and live apps.

38 • Reka AI

Reka Flash 3

Context: 128kScore: 46$0.35 / 1M56.9 t/s

Balanced, with solid multimodal chops.

Best for: enterprise trials.

39 • Google

Gemini 2.0 Flash (Experimental)

Context: 1MScore: 46Free209.0 t/s

Fast research‑only version for prototyping.

Best for: school projects and prototypes.

40 • DeepSeek

DeepSeek V3 (Dec ’24)

Context: 128kScore: 45$0.48 / 1M25.4 t/s

Open‑source with a proven track record.

Best for: low‑cost coding help.

41 • Alibaba

Qwen2.5 Max

Context: 32kScore: 45$2.80 / 1M51.5 t/s

Strong multilingual performance.

Best for: Asian markets and enterprise apps.

42 • Google

Gemini 1.5 Pro (Sept)

Context: 2MScore: 44$2.19 / 1M97.1 t/s

Huge memory window for deep retrieval.

Best for: knowledge bases and long‑form QA.

43 • Anthropic

Claude 3.5 Sonnet (Oct)

Context: 200kScore: 44$6.00 / 1M77.8 t/s

Careful, ethical writing and analysis.

Best for: consulting, teaching, and compliance.

44 • Perplexity

Sonar

Context: 127kScore: 43$1.00 / 1M86.9 t/s

Strong live search and synthesis.

Best for: research, journalism, and business tracking.

45 • Meta

Llama 4 Scout

Context: 10MScore: 43$0.27 / 1M122.4 t/s

Processes truly massive inputs like books or large codebases.

Best for: entire books, giant codebases, or case files.

46 • Perplexity

Sonar Pro

Context: 200kScore: 43$6.00 / 1M82.6 t/s

Deeper web search with more detail.

Best for: professional research, strategy, and investigations.

47 • Alibaba

QwQ 32B‑Preview

Context: 33kScore: 42$0.26 / 1M54.9 t/s

Purpose‑built for question answering.

Best for: FAQs, help desks, and student quizzes.

48 • Amazon

Nova Premier

Context: 1MScore: 42$5.00 / 1M68.0 t/s

Great for AWS‑native stacks and compliance needs.

Best for: cloud apps and regulated workloads.

49 • OpenAI

GPT‑4o (Nov ’24)

Context: 128kScore: 41$4.38 / 1M131.1 t/s

Proven multimodal model with stable performance.

Best for: steady reliability across modalities.

How to Choose from 26–49

If you’re unsure, start with Gemini 2.5 Flash (#34) for speed and cost, then pair a specialist like Claude 3.7 (#36) for careful long‑form work.

Coming Up Next

Part 2 (11–25) covers stronger models that come close to the Top 10 but cost less. They blend serious reasoning with practical pricing.

Read Part 2 →

What This Means for the Future

The lower half of the list shows a clear direction: faster, cheaper, and much larger contexts. Expect layered systems where a quick, low‑cost model handles easy steps and passes complex parts to a smarter one. Open source will keep expanding in coding, math, and multilingual support.

Top 49 LLMs — Part 1: FAQ

How were these ranks calculated?

We used the Artificial Analysis Intelligence Index v2.2: a weighted blend of reasoning, math, coding, instruction following, and long‑context performance, plus practical signals like price and speed.

Do lower ranks (26–49) still make sense to use?

Yes. Many are faster and cheaper workhorses for production tasks or high‑volume workflows where top‑10 quality isn’t required.

Why are some models marked “Experimental” or “Preview”?

Vendors often release experimental builds for research access. We include them for context but flag pricing or availability caveats where relevant.

What should I pick if I need speed over accuracy?

Start with Gemini 2.5 Flash (#34) or Gemini 2.0 Flash (#37), then pair with a careful writer like Claude 3.7 Sonnet (#36) for the hard parts.