HeadlinesBriefing favicon HeadlinesBriefing.com

How a Unicode Bug Silently Destroyed User Data in a Collaborative Editor

Hacker News •
×

A developer recounts a terrifying bug that caused silent data loss in a collaborative editor built with TipTap and Yjs. Users would keep typing normally, but their changes stopped syncing to the CRDT document without any error messages. When they reopened the page, everything after the failure point was gone. The team spent months unable to reproduce it.

Their product manager eventually cracked the issue by methodically narrowing down the trigger. The bug occurred when inserting one multi-byte emoji adjacent to another—specifically emoji above U+FFFF that require surrogate pairs. This caused the lib0 splice method to split a surrogate pair down the middle, producing an orphaned high or low surrogate.

When Yjs tried to sync, it passed this invalid character to encodeURIComponent, which threw an uncaught URIError. Nothing in the error handling caught it, so sync simply stopped while the editor kept working locally. The team shipped two workarounds: adding offline support as a hedge, and attaching a global error listener that detected the URIError and prompted users to reload.

The real fix came from two directions. lib0 got patched to detect orphaned surrogates and replace them with U+FFFD (the Unicode replacement character). The team also made emoji an atomic node type in ProseMirror, treating each one as an indivisible unit so cursor movements and edits couldn't split them in half.