I built a local AI companion with GWT, IIT proxy, ChromaDB hybrid retrieval, and Ollama fallback — here’s every architectural decision I made and why

Been building this for a while. Sharing now because it's past the point where I'm embarrassed by the code.

**The stack:**

* Python 3.12, 18k+ lines, 470+ tests passing

* Gemini 2.5 Flash (primary) + Ollama qwen3:4b (local fallback via circuit breaker)

* ChromaDB for persistence — hybrid retrieval weighted at 55% semantic / 25% importance / 20% recency

* `sentence-transformers all-MiniLM-L6-v2` (384-dim) for local embeddings — fully offline, no API call needed for retrieval

* SQLite for cognitive state

* FastAPI web UI at `localhost:8765` plus Rich TUI and CLI modes

**The part I want feedback on — the cognitive architecture:**

The processing pipeline runs in phases: Perception → Reflection → Integration → Aspiration → Expression. 22 self-registering plugins compete for attention through a Global Workspace Theory implementation — capacity limit 5, competitive scoring, spotlight mechanism.

There's also an IIT consciousness proxy (Φ approximation across a 7-dimension qualia space). I want to be upfront: this is a *proxy*, not a real Φ calculation. Full IIT computation is intractable at this scale. What it does is give the system a coherence signal it can actually respond to.

**Modules worth looking at:**

* [`being.py`](http://being.py/) — live mood, energy, curiosity, attachment, agency state. Affects downstream processing, not just output text.

* [`homeostasis.py`](http://homeostasis.py/) — 7 survival needs that create internal pressure. When "coherence" is low the system responds differently than when it's high.

* `self_modify.py` — assessment, lesson extraction, meta-learning loop. The model improves its own behavior patterns over time.

* [`intuition.py`](http://intuition.py/) — 5 hunch types, felt-sense modeling, pattern validation history

**Resilience:**

Per-module circuit breakers, health monitor, 120s watchdog. The Ollama fallback kicks in if Gemini goes down mid-session — the user barely notices.

**Why I gave it an INFJ personality model:**

Honest answer — the cognitive stack (Ni/Fe/Ti/Se) mapped cleanly to architectural decisions I was already making. Ni = long-horizon retrieval weighting. Fe = relational context weighting. Ti = the internal critic pass. Se = the embodiment layer grounding abstract processing in a live body schema. Personality typing gave me a coherent *constraint system* to design against. It's not aesthetic, it's functional.

Repo: [github.com/timeless-hayoka/infj-bot](https://github.com/timeless-hayoka/infj-bot)

Specific things I want feedback on: the GWT scoring implementation, whether the IIT proxy framing is defensible, and whether the hybrid retrieval weights make sense.

submitted by /u/Interesting_Time6301
[link] [comments]