Multimodal Neurons in Artificial Neural Networks
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually.
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually.
We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.
We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.
We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples.