transformers using plain numpy

I built a multi layer NN using numpy, from Andrew Ng's Deep Learning course on coursera. I grok most of it except for the calculus. I have some tutorials on building transformers with PyTorch, which I'll be working through. How hard would it be to build a transformer using straight Numpy? Anyone know how to do this? We're running a local LLM user group for learning how to run Llama 2 and Mistral, but I want to investigate whether it would be possible to build a tiny LLM using Pytorch. And if at all possible, using Numpy. How hard would building a transformer be using Numpy, and how hard would it be to build a tiny LLM using Numpy and wikipedia data?

One possible goal is to build a barebones LLM that is small, like 100m parameters, so that people with every day graphics cards can run it using the Open LLM toolsets and even fine tune it. And one major goal is to understand the guts of a transformer by building one from scratch. I've learned so much about deep learning models by building them from scratch with Numpy, using PyTorch there was so much I didn't learn about them.

submitted by /u/xyz_TrashMan_zyx
[link] [comments]