| I've been using Anthropic and OpenAI for a year and once I tried ollama - so slow - I totally wrote off local. But I guess things have changed. I picked up a used gaming rig with a 3090 last weekend. Yesterday I set up qwen 3.6:35b-a3b. I got the model that had been squeezed down to 20GB (batiai/qwen3.6-35b:iq4) so it all fit on the 3090. When it was in system ram it was doing a respectable 15tps on output but once I got it all stuffed into VRAM it's output was up to 160tps. Then I fed it a picture. The video processing took 75 seconds but... wow. Just. Wow. That's pretty damn good running local on a 5 year old video card! I guess you guys are used to this but it sure surprised me! And we watched a transcoded movie via Plex at the same time! I can see why you guys love the 3090 so much. Hell of a card. [link] [comments] |