HeadlinesBriefing favicon HeadlinesBriefing.com

Qwen3.6-35B-A3B Outperforms Claude Opus 4.7 in Laptop-Generated Pelican Benchmark

Hacker News •
×

Alibaba's Qwen3.6-35B-A3B model running on a MacBook Pro M5 via LM Studio produced a more accurate pelican riding a bicycle illustration than Anthropic's Claude Opus 4.7, according to a Hacker News comparison. The test used quantized versions of both models, with Qwen's 20.9GB version generating a coherent image despite size constraints, while Opus 4.7 struggled with bicycle frame geometry even at maximum thinking_level.

This result challenges assumptions about model utility, as the pelican benchmark—originally a tongue-in-cheek critique of AI comparison methods—has shown surprising correlations with real-world performance. Early 2024 pelican images were low quality, but recent iterations reveal practical value, with Gemini 3.1 Pro already producing usable illustrations. Qwen's open-source approach contrasts with Anthropic's proprietary system, raising questions about accessibility versus raw capability.

The debate extends to whether specialized benchmarks like this reflect genuine model strengths. While Qwen's SVG output included technical annotations praised in transcripts, the experiment highlights how subjective evaluations can influence perceptions of AI progress. Neither company has commented on the specific test parameters or training data.

For developers needing local deployment capabilities, Qwen3.6-35B-A3B currently offers better practical results for niche illustration tasks on consumer hardware. This unexpected outcome underscores the complexity of evaluating AI models beyond standardized benchmarks.