Google’s Gemma 4 12B just dropped – here’s how to run it locally on your Mac

Google released Gemma 4 12B today. It’s a solid open-source model (Apache 2.0) that’s multimodal and runs really well on Macs with 16GB or more unified memory. Good at reasoning, coding, and agent stuff.

Quick Mac-friendly info
• 12B parameters, fits nicely on M2/M3/M4 Macs (especially with Q4/Q5 quant)
• 256K context
• Text + vision + audio support

Easiest way to run it: Ollama
1. Download and install Ollama from ollama.com (the Mac app is super simple). Or use Homebrew if you prefer.
2. Open Terminal and pull the model: ollama pull gemma4:12b
3. Run it: ollama run gemma4:12b
That’s it. You can start chatting right away.

Mac tips:
• Ollama uses Metal automatically so it runs pretty fast on Apple Silicon.
• 16GB Macs handle the 12B model fine. 32GB feels even better.
• Great for pairing with Continue.dev in VS Code if you code a lot.

Other options if Ollama isn’t your thing: LM Studio (nice GUI), or llama.cpp for more control.

Has anyone tried the image or audio features locally yet?
How fast is it on your machine?
Drop your specs and results if you test it.

submitted by /u/nullvector88
[link] [comments]