“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,” posted to arXiv in March 2020 by Ben Mildenhall, Pratul Srinivasan, Matthew Tancik, Jonathan Barron, Ravi Ramamoorthi, and Ren Ng of UC Berkeley, UC San Diego, and Google, revived an old goal of computer vision - reconstructing 3D scenes from photographs - using a surprising representation. Instead of building an explicit mesh or point cloud, NeRF stores the entire scene inside the weights of a small neural network.
The network takes a 5D input - a 3D point in space plus a viewing direction - and outputs the color and density at that point. To render an image from a new viewpoint, NeRF casts rays into the scene, samples points along each ray, queries the network, and combines the results using classic volume rendering. Because that rendering process is differentiable, the network can be optimized to reproduce a set of input photographs with known camera positions, and once trained it can synthesize photorealistic views from angles never actually photographed, including realistic reflections and fine geometry.
NeRF set off an explosion of research in neural rendering and 3D reconstruction, with hundreds of follow-up papers making it faster, sharper, and able to handle moving scenes. It pointed toward a future where capturing a real place or object is as simple as taking a few photos and letting a network fill in the rest - relevant to virtual reality, visual effects, mapping, robotics, and e-commerce product views. Where Larry Roberts in 1963 recovered crude block shapes from line drawings, NeRF recovers full photorealistic 3D from ordinary snapshots.