I built the world’s first Chrome extension that runs LLMs entirely in-browser—WebGPU, Transformers.js, and Chrome’s Prompt API
There are plenty of WebGPU demos out there, but I wanted to ship something people could actually use day-to-day. It runs Llama 3.2, DeepSeek-R1, Qwen3, Mistral, Gemma, Phi, SmolLM2—all locally in Chrome. Three inference backends: WebLLM (MLC/WebGPU) T…