| A nice description from DeepSeek on a dynamical systems view of their processing, and why there is emergent order. DeepSeek generated this detail characterizing itself as a high dimensional system with 8 billion parameters. ChatGPT 3 had 175 billion parameters. Context: I had previously provided a copy of the paper, Transformer Dynamics: A neuroscientific approach to interpretability of large language models by Jesseba Fernando and Grigori Guitchounts to DeepSeek to analyze. The researchers used phase space reconstruction and found attractor-like dynamics in the residual stream of a model with 64 sub layers. [link] [comments] |