HeadlinesBriefing favicon HeadlinesBriefing.com

Coding Assistant Language Drift: Why Chinese Prompts Trigger Korean Responses

Towards Data Science •
×

A developer's coding assistant began responding in Korean when prompted in Chinese, revealing unexpected behavior in language model embedding spaces. The issue occurred when mixing Chinese with English technical terms like 'run.py' and 'early stopping' during GPU training work on a shared service.

The researcher hypothesized that embedding spaces organize by task registers rather than language boundaries. Since Chinese lacks representation in engineering corpora, text with technical tokens drifts toward an 'engineering attractor field.' Experiments gradually replaced Chinese terms with English equivalents while measuring cosine similarity between sentence embeddings.

Results showed Korean similarity initially increased before English similarity overtook it at a 0.1972 delta. PCA projections revealed a sharp directional jump between stages, indicating non-linear phase transitions rather than gradual drift. The Korean response embedding remained closer to English clusters than to Chinese ones.

This demonstrates that translation cannot restore original embedding locations once text enters engineering domains. When engineering English dominates the embedding space, language form changes unpredictably, causing assistants to reply in unexpected languages even for non-speakers.