How do "bits per weight" work on a practical level ? And how can you have fractional like 4.25bpw ?
How do "bits per weight" work on a practical level ? And how can you have fractional like 4.25bpw ?

How do "bits per weight" work on a practical level ? And how can you have fractional like 4.25bpw ?

The idea of 4.25 bits is weird to me, I have no knowledge of an information unit less then 1 bit.

Is it 4.25 bpw on average ? Like, some weights are 4 bits some are 5 bits ? If so, how are the weights chosen to have more bits than others ? Are variable "bit rate" weights a thing ? Could some high importance weight keep the full 16 bits ? Could weight sizes be scaled up and down on the fly to trade accuracy for speed on a per prompt basis ? Is it possible to create an "importance map" or "topic/word cloud map" of the weigths, scaling weights groups bit depth on the fly per prompt per topic ?

submitted by /u/transdimensionalmeme
[link] [comments]