ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Paper Code

Overview

We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. Compared with existing methods mainly developed for single objects with masked backgrounds, we propose key improvements to address challenges introduced by in-the-wild scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. As the data mixture presents various issues such as depth-scale ambiguity, we present a novel camera parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360º scenes, and propose SDS-anchoring to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art in LPIPS on DTU in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging MipNeRF360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance.

All view synthesis results are from NeRF distillation using the same diffusion model.

View Synthesis Results: CO3D





Input	Novel Views	Input	Novel Views

View Synthesis Results: RealEstate10K

In addition to object-centric CO3D scenes, ZeroNVS also handles more complex scenes without centered objects, such as RealEstate10K scenes.





Input	Novel Views	Input	Novel Views

View Synthesis Results: Mip-NeRF 360

For Mip-NeRF 360, we show results for our base model (left) and with our proposed SDS anchoring. SDS anchoring is mainly intended to improve background quality; without anchoring, SDS tends to converge to very monotone backgrounds.

Input	Novel Views (no anchoring)	Input	Novel Views (SDS anchoring)







Input	Novel Views (no anchoring)	Input	Novel Views (SDS anchoring)

View Synthesis Results: DTU



Input	Novel Views	Input	Novel Views

Qualitative visualization of depth and normals

Though 3D reconstruction is not the focus of our work, our method distills NeRFs with high quality normals and depth.




Input	Novel views, normals, and depth

Citation


@article{zeronvs,
  author = {
    Sargent, Kyle and Li, Zizhang and Shah, Tanmay and Herrmann, Charles and Yu, Hong-Xing and Zhang, Yunzhi
    and Chan, Eric Ryan and Lagun, Dmitry and Fei-Fei, Li and Sun, Deqing and Wu, Jiajun},       
  title = {{ZeroNVS}: Zero-Shot 360-Degree View Synthesis from a Single Real Image},
  journal={CVPR, 2024},
  year={2023}
}