HeadlinesBriefing favicon HeadlinesBriefing.com

JS Character Counting Pitfalls Beyond Surrogate Pairs

DEV Community •
×

Developers often fix emoji counting with `[...str].length`, but a Laravel textarea project revealed deeper issues. The core problem is newline normalization: browsers convert single `\n` to `\r\n` during HTTP submission, causing a 20+ character mismatch for dense text. This discrepancy persists in round-trip edits, where the database stores CRLF but JavaScript retrieves LF, creating a constant validation conflict.

Beyond newlines, other pitfalls include lone surrogates from copy-paste corruption, which MySQL may reject, and NFD form characters from older macOS systems. These issues aren't solved by spread syntax alone. Comparing methods shows `Intl.Segmenter` counts visual graphemes but mismatches MySQL's `CHAR_LENGTH()`, while `TextEncoder` measures bytes for index limits. Each tool serves a different purpose.

The solution requires normalizing newlines to CRLF in JavaScript before counting, matching MySQL's post-submission state. For VARCHAR limits, `[...str].length` remains the best match, but developers must filter lone surrogates first. This highlights how frontend and backend systems handle Unicode differently, making naive character counting a subtle source of validation bugs in full-stack applications.