How AI Training Scales
We’ve discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training.
We’ve discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training.