artificial

Inference at 16k tokens/second

February 23, 2026 February 23, 2026

This is the most insane thing I have seen so far. 17k tokens/second. I just tried their chatbot from taalas.com. I asked it to do a comparison between Nvidia, cerebras, groq and taalas. I got the response in 0.058s and token output was 15k.

This is some godly speed with a llama3 8B param model.

If they launch a developer kit, I will surely buy it.

What do you guys think about this?

submitted by /u/awscloudengineer
[link] [comments]