I built a 100% local, CPU-only voice loop for any LLM — no GPU, no cloud, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)
I built a 100% local, CPU-only voice loop for any LLM — no GPU, no cloud, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

I built a 100% local, CPU-only voice loop for any LLM — no GPU, no cloud, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

Every voice interface I found either needed a GPU, a cloud API, or was locked to one OS. So I built one that needs none of that — and benchmarked it so the numbers are real.

The stack — all ONNX, all CPU:

  • Silero VAD — neural voice activity detection, ~0.09 ms/frame. Knows when you stop talking so there's no push-to-talk.
  • Parakeet TDT 0.6B v3 — INT8 transcription, 25 languages, OpenAI-compatible on :5093. A 2.4 s clip → 307 ms on an i7 (~8× realtime).
  • Supertonic TTS 3 — FP16 synthesis. Short replies in ~1.4 s. On Apple Silicon M5 Neural Engine: 33× realtime for STT, 16× for TTS.

Data flow:

you → Silero VAD → Parakeet STT → your LLM (Ollama / LM Studio / vLLM / any OpenAI-compatible) → Supertonic TTS → speakers 

Zero cloud. Zero API keys. Nothing routes outside the machine.

Works with Claude Code, OpenCode CLI, OpenClaw, Hermes Agent, and Codex. One install wires voice into your agent and starts the services (systemd/launchd/Task Scheduler).

Install (macOS / Linux):

git clone https://github.com/groxaxo/Local-VoiceMode-LLM cd Local-VoiceMode-LLM && ./setup.sh 

Windows: .setup.ps1

Ollama one-liner (standalone, no clone):

bash <(curl -fsSL https://raw.githubusercontent.com/groxaxo/Local-VoiceMode-LLM/main/integrations/ollama/install-ollama-voice.sh) 

Benchmarks are reproducible via python benchmarks/run_benchmark.py in the repo. MIT-licensed, free.

GitHub: https://github.com/groxaxo/Local-VoiceMode-LLM

submitted by /u/blackstoreonline
[link] [comments]