HeadlinesBriefing favicon HeadlinesBriefing.com

ATLAS: Scaling Laws for Multilingual AI Models

The latest research from Google •
×

Google researchers introduce ATLAS, a novel approach to scaling multilingual language models. The study addresses the scarcity of data-driven guidance for non-English languages, which are crucial for serving a global audience. The research involved 774 training runs and 400+ languages, offering insights into model size, data volume, and language mixtures for optimal performance.

ATLAS provides a cross-lingual transfer matrix to identify language synergies and interference. It also includes rules for deciding when to pre-train a model from scratch versus fine-tuning an existing multilingual checkpoint. The research uses the MADLAD-400 dataset to evaluate performance, offering practical guidance on balancing language mix with model size. Prior scaling laws focused primarily on English.

The research reveals that for languages like Norwegian, Swedish and German are helpful, while Malay benefits from Indonesian, and Arabic from Hebrew. The biggest predictor of positive transfer is the sharing of a script or language family. It also addresses the 'curse of multilinguality,' where performance decreases with each added language, offering scaling rules.

What's next? This research will help developers build more efficient and effective multilingual AI models. The findings will be presented at ICLR 2026, and the practical application of ATLAS can be seen in improved performance and cost savings. This is a crucial step towards making AI more accessible and useful across the globe.