HeadlinesBriefing favicon HeadlinesBriefing.com

CERN's CASTOR storage system gives way to CTA tape archive

Hacker News •
×

CASTOR (CERN Advanced STORage manager) handles hierarchical storage for petabyte‑scale physics data, mixing disk pools and tape archives. Users access files through command‑line tools or applications built on the CASTOR API, with XROOT as the preferred protocol and legacy RFIO support dropped in 2016. The system evolved from the 1990s SHIFT platform.

Internally CASTOR relies on a component architecture centered on a central database that tracks state changes. The Stager manages disk allocation, the Name Server stores namespace metadata and tape copy details, while the Tape Infrastructure moves files to Oracle StorageTek T10000C and IBM TS1140 cartridges housed in four SL8500 and three TS3500 libraries. Current tape capacity sits near 100 PB.

In June 2020 CERN launched CTA, the CERN Tape Archive, to replace CASTOR gradually. CTA inherits the same API surface but modernizes tape handling and integrates with the SRM protocol for grid‑wide data movement. Legacy CASTOR components remain operational for legacy workflows, but new LHC data pipelines now target CTA as the primary archival tier.

The combination of disk staging and automated tape libraries lets CERN keep costs low; tape storage costs per terabyte are far cheaper than spinning disks and consume no power when idle. Access latency stretches to minutes, a trade‑off accepted for long‑term preservation of the LHC’s ever‑growing dataset, which now exceeds several exabytes across the archive.