HeadlinesBriefing favicon HeadlinesBriefing.com

Stable Audio 3 Delivers Fast Variable-Length AI Music Generation

Hacker News •
×

Stability AI has released Stable Audio 3, a new family of latent diffusion models designed for variable-length audio generation and editing. The system includes small, medium, and large variants that can produce several minutes of audio while avoiding the computational cost of full-length generations for shorter clips.

The models leverage a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity. This architecture encourages semantic structure in the latent space and supports inpainting capabilities for targeted audio editing and extending short recordings.

Adversarial post-training accelerates inference and improves quality, reducing steps needed while enhancing fidelity and prompt adherence. The models generate music and sounds in under 2 seconds on H200 GPU and a few seconds on MacBook Pro M4, making them practical for real-world applications.

Stability AI releases the small and medium model weights for consumer-grade hardware alongside their complete training and inference pipeline. This democratizes professional-grade audio generation tools for individual creators and small teams who previously lacked access to such sophisticated capabilities.