artificial

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

April 19, 2026 April 19, 2026

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models: - miss important files - reason over incomplete information - require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided: - embeddings - vector DBs - external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

Docs : https://manojmallick.github.io/sigmap/ Github: https://github.com/manojmallick/sigmap

submitted by /u/Independent-Flow3408
[link] [comments]