Multimodal Neurons in Artificial Neural Networks
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually.
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually.
We’ve scaled Kubernetes clusters to 7,500 nodes, producing a scalable infrastructure for large models like GPT-3, CLIP, and DALL·E, but also for rapid small-scale iterative research such as Scaling Laws for Neural Language Models. Scaling a single Kubernetes cluster to this size is rarely done
We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.
We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.
We’ve applied reinforcement learning from human feedback to train language models that are better at summarization. Our models generate summaries that are better than summaries from 10x larger models trained only with supervised learning. Even though we train our models on the Reddit TL;DR dataset, the same
We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples.
Decisions based on machine learning (ML) are potentially advantageous over human decisions, but the data used to train them often contains human and societal biases that can lead to harmful decisions.
Our waking and sleeping lives are punctuated by fragments of recalled memories: a sudden connection in the shower between seemingly disparate thoughts, or an ill-fated choice decades ago that haunts us as we struggle to fall asleep.