HeadlinesBriefing favicon HeadlinesBriefing.com

Google Launches WAXAL: Open Speech Dataset for 27 African Languages

Google AI Blog •
×

Google Research has unveiled WAXAL, a groundbreaking open-access speech dataset covering 27 Sub-Saharan African languages. This initiative addresses the critical shortage of speech technology resources for African languages, which has left hundreds of millions unable to access voice-enabled technologies in their native tongues.

Developed through a multi-year collaboration with African academic institutions including Makerere University and the University of Ghana, WAXAL provides approximately 1,846 hours of transcribed natural speech for automatic speech recognition and over 565 hours of high-fidelity recordings for text-to-speech. The dataset is released under a Creative Commons license to catalyze research and enable inclusive voice technologies tailored to the continent's unique linguistic characteristics.

This foundational resource aims to empower the regional AI research ecosystem by providing the high-quality, permissively licensed data necessary to build robust speech systems. The WAXAL collection will continuously evolve to include additional languages as part of Google's ongoing effort to bridge the digital divide and preserve linguistic diversity across Africa.