Llama 3.1 8B CPU inference on any PC with a browser
Llama 3.1 8B CPU inference on any PC with a browser

Llama 3.1 8B CPU inference on any PC with a browser

In May of this year, a team at Yandex Research, in collaboration with ISTA and KAUST, published a new SOTA quantization method called PV-tuning.

This project from one of the authors runs models like Llama 3.1 8B inside any modern browser using PV-tuning compression.

Demo

Code

submitted by /u/azalio
[link] [comments]