Wow! Qwen 3.6:35b-a3b on a 3090… pretty amazing.

I've been using Anthropic and OpenAI for a year and once I tried ollama - so slow - I totally wrote off local. But I guess things have changed.

I picked up a used gaming rig with a 3090 last weekend. Yesterday I set up qwen 3.6:35b-a3b. I got the model that had been squeezed down to 20GB (batiai/qwen3.6-35b:iq4) so it all fit on the 3090.

When it was in system ram it was doing a respectable 15tps on output but once I got it all stuffed into VRAM it's output was up to 160tps. Then I fed it a picture.

https://preview.redd.it/cmpali41ev4h1.png?width=1882&format=png&auto=webp&s=a4c7732b9820730cc3f38b604ee04d465d7cc86e

The video processing took 75 seconds but... wow. Just. Wow. That's pretty damn good running local on a 5 year old video card! I guess you guys are used to this but it sure surprised me!

And we watched a transcoded movie via Plex at the same time! I can see why you guys love the 3090 so much. Hell of a card.

submitted by /u/LankyGuitar6528
[link] [comments]