Hello people! Sharing a video from my YT channels that talk about Transformers and the major reasons for its superior scalability and generality over other networks. I mainly focus on the often forgotten/overlooked impact of Transformers and its impact on inductive biases in ML architectures. Hope y’all enjoy!
[link] [comments]