HeadlinesBriefing favicon HeadlinesBriefing.com

Alibaba's Qwen3.5 Medium Models Beat Sonnet 4.5 on Local GPUs

Hacker News •
×

Alibaba's Qwen AI team has released the Qwen3.5 Medium Model series, featuring four new large language models that deliver near-Sonnet 4.5 performance on local computers. The open-source models include Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B, all available under Apache 2.0 license on Hugging Face and ModelScope. A fourth model, Qwen3.5-Flash, is proprietary but offers competitive API pricing through Alibaba Cloud.

These models leverage a hybrid architecture combining Gated Delta Networks with a sparse Mixture-of-Experts system, enabling remarkable efficiency. The flagship Qwen3.5-35B-A3B activates only 3 billion parameters per token despite housing 35 billion total, while maintaining accuracy even under 4-bit quantization. This allows the model to process over 1 million tokens on consumer GPUs with just 32GB VRAM—far less compute than comparable options. Benchmark tests show the 35B-A3B model surpassing OpenAI's GPT-5-mini and Anthropic's Claude Sonnet 4.5 in knowledge and visual reasoning tasks.

The Qwen3.5-Flash API offers some of the most competitive pricing in the industry, with input tokens at $0.10 per million and output at $0.40 per million. The models feature native 'Thinking Mode' for complex reasoning and support massive context windows, making them particularly valuable for enterprise applications requiring local deployment, data sovereignty, and cost-effective AI integration.