I built the world’s first Chrome extension that runs LLMs entirely in-browser—WebGPU, Transformers.js, and Chrome’s Prompt API

There are plenty of WebGPU demos out there, but I wanted to ship something people could actually use day-to-day.

It runs Llama 3.2, DeepSeek-R1, Qwen3, Mistral, Gemma, Phi, SmolLM2—all locally in Chrome. Three inference backends:

WebLLM (MLC/WebGPU)
Transformers.js (ONNX)
Chrome's built-in Prompt API (Gemini Nano—zero download)

No Ollama, no servers, no subscriptions. Models cache in IndexedDB. Works offline. Conversations stored locally—export or delete anytime.

Free: https://noaibills.app/?utm_source=reddit&utm_medium=social&utm_campaign=launch_artificial

I'm not claiming it replaces GPT-4. But for the 80% of tasks—drafts, summaries, quick coding questions—a 3B parameter model running locally is plenty.

Not positioned as a cloud LLM replacement—it's for local inference on basic text tasks (writing, communication, drafts) with zero internet dependency, no API costs, and complete privacy.

Core fit: organizations with data restrictions that block cloud AI and can't install desktop tools like Ollama/LMStudio. For quick drafts, grammar checks, and basic reasoning without budget or setup barriers.

Need real-time knowledge or complex reasoning? Use cloud models. This serves a different niche—**not every problem needs a sledgehammer** 😄.

Would love feedback from this community 🙌.

submitted by /u/psgganesh
[link] [comments]