HeadlinesBriefing favicon HeadlinesBriefing.com

cuTile Rust Brings Memory-Safe GPU Kernel Programming to NVIDIA GPUs

Hacker News •
×

NVIDIA researchers released cuTile Rust, a domain-specific language that enables memory-safe GPU kernel development in idiomatic Rust. The system applies Rust's ownership model to tile-based GPU programming, partitioning mutable tensors into disjoint pieces before kernel launch while sharing immutable data. This approach prevents data races at compile time without sacrificing performance.

The tool extends Rust's ownership discipline across the GPU launch boundary through the #[cutile::module] macro, which embeds captured Rust ASTs in the host binary and JIT-compiles them through CUDA Tile IR into GPU cubin executables. Developers can write kernels with familiar Rust syntax while the system handles the complex memory management and synchronization patterns required for safe parallel execution.

Benchmark results show impressive performance: cuTile Rust achieves 7 TB/s for element-wise operations and 2 PFlop/s for GEMM on NVIDIA B200 hardware, reaching 92% of peak dense f16 performance while maintaining safety guarantees. The system also powers Grout, a Qwen3 inference engine built in collaboration with Hugging Face.

Despite strong performance metrics, cuTile Rust remains an early-stage research project with active development ongoing. The team explicitly warns about bugs, incomplete features, and API breakage while inviting community feedback through CONTRIBUTING.md to help shape the project's direction.