We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. Compared with existing methods mainly developed for single objects with masked backgrounds, we propose key improvements to address challenges introduced by in-the-wild scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. As the data mixture presents various issues such as depth-scale ambiguity, we present a novel camera parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360ยบ scenes, and propose SDS-anchoring to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art in LPIPS on DTU in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging MipNeRF360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance.
All view synthesis results are from NeRF distillation using the same diffusion model.
 Input | Novel Views | Input | Novel Views |
In addition to object-centric CO3D scenes, ZeroNVS also handles more complex scenes without centered objects, such as RealEstate10K scenes.
Input | Novel Views | Input | Novel Views |
For Mip-NeRF 360, we show results for our base model (left) and with our proposed SDS anchoring. SDS anchoring is mainly intended to improve background quality; without anchoring, SDS tends to converge to very monotone backgrounds.
Input | Novel Views (no anchoring) | Input | Novel Views (SDS anchoring) | ||||||
Input | Novel Views (no anchoring) | Input | Novel Views (SDS anchoring) |
Input | Novel Views | Input | Novel Views |
Though 3D reconstruction is not the focus of our work, our method distills NeRFs with high quality normals and depth.
Input | Novel views, normals, and depth |
@article{zeronvs,
author = {
Sargent, Kyle and Li, Zizhang and Shah, Tanmay and Herrmann, Charles and Yu, Hong-Xing and Zhang, Yunzhi
and Chan, Eric Ryan and Lagun, Dmitry and Fei-Fei, Li and Sun, Deqing and Wu, Jiajun},
title = {{ZeroNVS}: Zero-Shot 360-Degree View Synthesis from a Single Real Image},
journal={CVPR, 2024},
year={2023}
}