artificial

Embedding Drift silently broke our RAG

/u/coolandy00

December 4, 2025 December 4, 2025

Our RAG stack degraded slowly over months.

Text-shape differences created different embedding vectors
Hidden characters slipped in from OCR
Partial updates mixed old and new embeddings
Incremental index rebuilds drifted from ground truth

Retrieval looked random at times, but the retriever wasn’t the problem.

We enforced a consistent embedding pipeline:

Canonical preprocessing that never changes silently
Full re-embeddings instead of patching
Version-pinned embedding model
Stable index rebuild rules tied to segmentation changes

Impact:

Retrieval reliability improved immediately
Embedding clusters became predictable
Fewer “mysterious RAG failures”
Debug time dropped dramatically

Have you seen embedding drift show up in long-running systems?

submitted by /u/coolandy00
[link] [comments]