Why do neural networks catastrophically forget old tasks when learning new ones? It's not a capacity problem... it's fundamental to how gradient descent works. Deep dive into the stability-plasticity dilemma and what it means for production systems.
[link] [comments]