HeadlinesBriefing favicon HeadlinesBriefing.com

25‑K‑Parameter Transformer Runs on a 1 MHz Commodore 64

Hacker News •
×

In a playful showcase, gizmo64k released a 25,000‑parameter transformer that runs natively on a 1 MHz Commodore 64. Written entirely in hand‑crafted 6502/6510 assembly, the model mirrors the architecture powering ChatGPT and Gemini, yet fits on a single floppy disk. Users can load the disk image into any C64 emulator and type prompts directly for a quick test run.

Running the pre‑built `soulplayer.d64` on VICE, the C64 calculates each token in roughly 60 seconds, emitting a distinctive SID blip that marks progress. The model understands lowercase letters, spaces, and basic punctuation; capital letters become unknown tokens. The architecture includes two layers, four heads, and a 20‑token context window, making it a tangible proof of concept.

Training is straightforward: developers supply a small corpus in <SEP> input <SEP> response <SEP> format, run `python train.py`, and the script outputs a quantized int8 model and a BPE tokenizer. The repo ships with a ready‑to‑run disk image (`meful.d64`) and a Python shadow that verifies the exact integer arithmetic of the 6502 code, mirroring the on‑hardware execution time.

Though it only produces fragmented sentences and runs at a minute per token, the project demonstrates that core transformer components—causal self‑attention, RMSNorm, and softmax—can be implemented with fixed‑point math on legacy hardware. It offers a nostalgic yet educational playground for researchers and hobbyists eager to explore low‑resource AI deployment without requiring modern GPUs or cloud infrastructure.