HyperFields: towards zero-shot NeRFs by mapping language to 3D geometry

Generating 3D objects based solely on text descriptions has proven extremely challenging for AI. Current state-of-the-art methods require optimizing a full 3D model from scratch for each new prompt, which is computationally demanding.

A new technique called HyperFields demonstrates promising progress in generating detailed 3D models directly from text prompts, without slow optimization.

The HyperFields approach instead aims to learn a generalized mapping from language to 3D geometry representations. This would allow tailored 3D models to be produced for new text prompts efficiently in a single feedforward pass, without slow optimization.

HyperFields combines two key techniques:

A dynamic hypernetwork that takes in text and progressively predicts weights for a separate 3D generation network. The weight predictions are conditioned on previous layer activations, enabling specialization.
Distilling individually optimized 3D networks into the hypernetwork, providing dense supervision for learning the complex text-to-3D mapping.

In experiments, HyperFields exceeded previous state-of-the-art methods in sample efficiency and wall-clock convergence time by 5-10x. It demonstrated the ability to:

Encode over 100 distinct objects like "yellow vase" in a single model
Generalize to new text combinations without seeing that exact prompt before
Rapidly adapt to generate completely novel objects with minimal fine-tuning

However, limitations remain around flexibility, fine-grained details, and reliance on existing 2D guidance systems.

TL;DR: HyperFields uses a dynamic hypernetwork to predict weights for a 3D generation network. The method is 5-10x faster than existing techniques and can quickly adapt to new text prompts, but has limitations in fine details.

Full summary is here. Paper here.

submitted by /u/Successful-Western27
[link] [comments]