Most multi-hop RAG goes stale the moment your data changes, what about a training-free approach that skips the graph rebuild?
Most multi-hop RAG goes stale the moment your data changes, what about a training-free approach that skips the graph rebuild?

Most multi-hop RAG goes stale the moment your data changes, what about a training-free approach that skips the graph rebuild?

Most multi-hop RAG goes stale the moment your data changes, what about a training-free approach that skips the graph rebuild?

Most methods that get strong multi-hop answers (GraphRAG, HippoRAG, RAPTOR, trained retrievers) build a knowledge graph or fine-tune a retriever over the corpus. That's fine until the data changes — then you re-extract / rebuild / retrain before the new facts are usable. For a corpus that updates daily, that's a real cost.

MOTHRAG does the multi-hop reasoning at query time over a plain dense index instead. An update is just embed + append (one embedding call) — no graph reconstruction, no retraining — so it stays current as the corpus changes.

And dropping the graph doesn't cost accuracy. F1, Llama-3.3-70B reader, n=1000 each:

System HotpotQA 2Wiki MuSiQue Avg Hardware
MOTHRAG 78.1 76.3 50.5 68.3 commodity API, no GPU
HippoRAG2 75.5 71.0 48.6 65.0
GraphRAG 68.6 58.6 38.5 55.2
RAPTOR 69.5 52.1 28.9 50.2

Competitor rows reproduced from HippoRAG2 (ICML 2025), Table 2. MOTHRAG is within ~0.7 avg F1 of the GPU-bound research frontier (a fine-tuned, GPU-served stack — not shown). (Fair note: graph-RAG systems like GraphRAG shine on small curated / sensemaking corpora — this is multi-hop factoid QA over changing data, a different regime.)

Deterministic by design: instead of a free-form agent loop it runs a small ensemble of reasoning arms (direct read, decomposition, an iterative grounding-driven arm) under a deterministic arbitrator, over a bridge retrieval substrate with multi-hop chain filtering. Every answer is proof-tree-structured, so you can audit why it answered. Measured ≈$0.018/query, ~44% cheaper at matched accuracy.

Open source, ~1 week old — genuinely after feedback and failure cases:

submitted by /u/ObjectiveEntrance740
[link] [comments]