<span class="vcard">DeepMind Blog</span>
DeepMind Blog

An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our m…

An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our m…

GopherCite: Teaching language models to support answers with verified quotes

Language models like Gopher can “hallucinate” facts that appear plausible but are actually fake. Those who are familiar with this problem know to do their own fact-checking, rather than trusting what language models say. Those who are not, may end up b…

GopherCite: Teaching language models to support answers with verified quotes

Language models like Gopher can “hallucinate” facts that appear plausible but are actually fake. Those who are familiar with this problem know to do their own fact-checking, rather than trusting what language models say. Those who are not, may end up b…