We built a data-free method for compressing heavy LLMs
Hey folks! I’ve been working with the team at Yandex Research on a way to make LLMs easier to run locally, without calibration data, GPU farms, or cloud setups. We just published a paper on HIGGS, a data-free quantization method that skips calibration …