Qwen 2.5 Max better than DeepSeek, beats ChatGPT in coding, costs 10x less than Claude 3.5

By Sagar Sharma | Updated on 31-Jan-2025

31-Jan-2025

In case all the buzz about DeepSeek over the past week wasn’t enough, Alibaba Cloud launched Qwen 2.5-Max, a state-of-the-art artificial intelligence model designed to outperform industry leaders like OpenAI’s GPT-4o and DeepSeek-V3.

This release marks a significant milestone in AI development, combining advanced technical capabilities with cost efficiency for enterprise applications.

But before we jump to the details, here is how it competes with other well known models:

Feature	Qwen 2.5-Max	GPT-4o (OpenAI)	DeepSeek-V3	Claude 3.5 Sonnet (Anthropic)
Architecture	MoE (72B)	Dense	Dense	Dense
Training Tokens	20T	Undisclosed	6T	Undisclosed
Context Window	128K tokens	32K tokens	128K tokens	100K tokens
Coding (HumanEval)	92.7%	90.1%	88.9%	85.6%
Cost per Million Tokens	$0.38	$5.00	$0.25	$3.00
Open Source?	No	No	Yes	No

What is Qwen 2.5-Max?

Qwen 2.5-Max is a 72-billion parameter Mixture-of-Experts (MoE) model developed by Alibaba Cloud. Unlike traditional dense models, it uses 64 specialised sub-networks (“experts”) that activate dynamically based on the task, reducing computational costs by 30% while maintaining high performance.

Also read: DeepSeek AI: How this free LLM is shaking up AI industry

The model was pretrained on 20 trillion tokens of data, including academic papers, code repositories, and multilingual web content. It supports text, image, audio, and video processing, with a 128,000-token context window (≈100,000 words), enabling it to analyse lengthy legal contracts or research papers in a single pass.

Notably, it can process 20-minute videos, generate SVG code from images, and handle audio inputs in 29 languages, including Mandarin, Arabic, and Hindi.

Qwen 2.5-Max leads in reasoning and coding benchmarks but trails Claude 3.5 Sonnet in creative writing tasks. While DeepSeek-V3 is cheaper, Qwen offers better price-to-performance for technical use cases. Unlike open-source rivals, Qwen 2.5-Max is accessible only via Alibaba’s API or Qwen Chat interface, limiting third-party customisation.

Why Qwen 2.5-Max is important

There are certain aspects of Qwen 2.5-Max which not only makes it important but an important player in the AI race.

1. Technical Superiority

Independent evaluations confirm Qwen 2.5-Max leads in critical benchmarks. On Arena-Hard, a reasoning test requiring multi-step logic, it achieves 89.4% accuracy, surpassing GPT-4o (83.7%) and Claude 3.5 Sonnet (88.1%).

For coding tasks, it scores 92.7% on HumanEval, outperforming GPT-4o’s 90.1% and DeepSeek-V3’s 88.9%. In scientific reasoning, it achieves 60.1% accuracy on GPQA-Diamond, a benchmark for graduate-level STEM questions, compared to Claude 3.5’s 58.3%. These results make it particularly valuable for industries like software development, healthcare diagnostics, and academic research.

2. Cost Efficiency

Priced at $0.38 per million input tokens, Qwen 2.5-Max is 10 times cheaper than GPT-4o ($5/M tokens) and 8 times cheaper than Claude 3.5 Sonnet ($3/M tokens).

This pricing democratises access to high-performance AI for startups and small businesses, particularly in sectors like finance and education where budget constraints are common. For example, a mid-sized healthcare firm could deploy Qwen 2.5-Max for medical scan analysis at 1/10th the cost of GPT-4o.

3. Strategic Impact

Released days after DeepSeek’s R1 model, Qwen 2.5-Max counters its rival’s low-cost disruption. Alibaba’s timing intensifies competition in China’s AI sector, with ByteDance and Tencent accelerating their own model upgrades. Industry analysts note this positions Alibaba as a key player in the global AI race, particularly for enterprise clients.

Limitations

Despite its strengths, Qwen 2.5-Max has notable limitations. It underperforms Claude 3.5 Sonnet in creative writing tasks, scoring 15% lower on the Creative Writing Benchmark (CWB). Its closed-source nature restricts developer customisation compared to DeepSeek’s open models. Additionally, while it processes 128K tokens efficiently, performance degrades slightly beyond 100K tokens in complex tasks.

Conclusion

Qwen 2.5-Max exemplifies Alibaba’s strategy to dominate the enterprise AI market through technical precision and affordability. By outperforming GPT-4o in coding and reasoning at a fraction of the cost, it appeals to businesses seeking advanced AI without prohibitive expenses.

While creative applications remain a weakness, its strengths in technical domains position it as a critical tool for industries like healthcare, finance, and software development. As the AI race intensifies, Qwen 2.5-Max underscores China’s growing influence in shaping global AI standards.

Also read: What is Distillation of AI Models: Explained in short

Sagar Sharma

A software engineer who happens to love testing computers and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants. View Full Profile