Embedding Drift silently broke our RAG
Embedding Drift silently broke our RAG

Embedding Drift silently broke our RAG

Our RAG stack degraded slowly over months.

  • Text-shape differences created different embedding vectors
  • Hidden characters slipped in from OCR
  • Partial updates mixed old and new embeddings
  • Incremental index rebuilds drifted from ground truth

Retrieval looked random at times, but the retriever wasn’t the problem.

We enforced a consistent embedding pipeline:

  • Canonical preprocessing that never changes silently
  • Full re-embeddings instead of patching
  • Version-pinned embedding model
  • Stable index rebuild rules tied to segmentation changes

Impact:

  • Retrieval reliability improved immediately
  • Embedding clusters became predictable
  • Fewer “mysterious RAG failures”
  • Debug time dropped dramatically

Have you seen embedding drift show up in long-running systems?

submitted by /u/coolandy00
[link] [comments]