HeadlinesBriefing favicon HeadlinesBriefing.com

Cheap 32GB VRAM Hack: A £200 Tesla V100 Powers LLMs

Hacker News •
×

A hobbyist paired a 16GB Tesla V100 SXM2 with an RTX 4080, creating 32GB of VRAM for under £200. The V100’s 900 GB/s HBM2 bus outpaces newer consumer cards, enabling a 27‑billion‑parameter model to run at 32 tokens per second.

The V100, bought for about £150 on eBay, lacks a PCIe slot and uses an SXM2‑to‑PCIe adapter that costs £50. The adapter lets the card sit beside the RTX 4080, and a simple fan‑control tweak trims noise from 82 dB to a quiet 10 % PWM setting.

Llama.cpp splits the model across both GPUs, trading a single‑card performance hit for a ten‑fold price reduction compared to a 32GB consumer GPU. The setup confirms that older data‑center GPUs still beat current Macs in memory bandwidth, offering a low‑cost path for local LLM inference.

The final build runs on NixOS with a legacy driver and CUDA 12.2, proving the hardware works without exotic software. The system delivers practical, affordable LLM inference for enthusiasts.