How do "bits per weight" work on a practical level ? And how can you have fractional like 4.25bpw ?
The idea of 4.25 bits is weird to me, I have no knowledge of an information unit less then 1 bit. Is it 4.25 bpw on average ? Like, some weights are 4 bits some are 5 bits ? If so, how are the weights chosen to have more bits than others ? Are variable…