/u/azalio

We built a data-free method for compressing heavy LLMs

/u/azalio April 19, 2025 April 19, 2025

Hey folks! I’ve been working with the team at Yandex Research on a way to make LLMs easier to run locally, without calibration data, GPU farms, or cloud setups. We just published a paper on HIGGS, a data-free quantization method that skips calibration …

artificial

Llama 3.1 8B CPU inference on any PC with a browser

/u/azalio December 28, 2024 December 28, 2024

In May of this year, a team at Yandex Research, in collaboration with ISTA and KAUST, published a new SOTA quantization method called PV-tuning. This project from one of the authors runs models like Llama 3.1 8B inside any modern browser using PV-tunin…

Share this:

Share this: