<span class="vcard">/u/transdimensionalmeme</span>
/u/transdimensionalmeme

How do "bits per weight" work on a practical level ? And how can you have fractional like 4.25bpw ?

The idea of 4.25 bits is weird to me, I have no knowledge of an information unit less then 1 bit. Is it 4.25 bpw on average ? Like, some weights are 4 bits some are 5 bits ? If so, how are the weights chosen to have more bits than others ? Are variable…

Does PCIe bandwidth matter for running inference in general ?

Difficult to find motherboard with more than 2 PCIe 16x slots. What if I connect GPUs through the PCIe 1x port ? Would that only affect loading the model once per boot and then have no impact on performance ? Does the model need to be reloaded many tim…