New technique to run 70B LLM Inference on a single 4GB GPU
New technique to run 70B LLM Inference on a single 4GB GPU