Kyle Sargent

I am a PhD student at Stanford, beginning in fall 2022, working in the Stanford AI Lab. I am advised by Jiajun Wu and Fei-Fei Li. I have also worked at Google Research as a student researcher, mentored by Deqing Sun and Charles Herrmann.

Previously, I was an AI resident at Google Research. Before joining Google, I received my undergraduate degree from Harvard, where I studied CS and math.

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  Misc

profile photo

Research

I work in computer vision. My main areas of focus are 3D reconstruction, novel view synthesis, and 3D generative models.

project image

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image



Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu
CVPR, 2024.
arxiv / project page /

We train a 3D-aware diffusion model, ZeroNVS on a mixture of scene data sources that capture object-centric, indoor, and outdoor scenes. This enables zero-shot SDS distillation of 360-degree NeRF scenes from a single image. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting. We also use the MipNeRF-360 dataset as a benchmark for single-image NVS.

project image

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations



Varun Jampani*, Kevis-Kokitsi Maninis*, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, Andre Araujo, Ricardo Martin-Brualla, Kaushal Patel, Daniel Vlasic, Vittorio Ferrari, Ameesh Makadia, Ce Liu, Yuanzhen Li, Howard Zhou
NeurIPS, 2023.
arxiv / project page /

We propose “NAVI”: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters.

project image

VQ3D: Learning a 3D Generative Model on ImageNet



Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun
ICCV, 2023. Oral Presentation. Best paper finalist.
arxiv / project page /

VQ3D introduces a 3D-aware NeRF-based decoder to the 2-stage ViT-VQGAN. Then, Stage 1 allows for novel view synthesis from input images, and Stage 2 allows for generation of totally new 3D-aware images. We achieve an ImageNet FID of 16.8, compared to 69.8 for the best baseline.

project image

Self-supervised AutoFlow



Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun
CVPR, 2023.
arxiv / project page /

We introduce self-supervised AutoFlow to handle real-world videos without ground truth labels, using self-supervised loss as the search metric.

project image

Pyramid Adversarial Training Improves ViT Performance



Kyle Sargent,* Charles Herrmann,* Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun (*equal contribution)
CVPR, 2022. Oral presentation. Best paper finalist.
arxiv /

Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). We propose “Pyramid Adversarial Training,” a strong adversarial augmentation which perturbs images at multiple scales during training. We achieve a new state of the art on ImageNet-C, ImageNet-Rendition, and ImageNet-Sketch using only our augmentation and the standard ViT-B/16 backbone.

project image

SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting



V. Jampani*, H. Chang*, K. Sargent, A. Kar, R. Tucker, M. Krainin, D. Kaeser, W. T. Freeman, D. Salesin, B. Curless, C. Liu (*equal contribution)
International Conference on Computer Vision, ICCV, 2021. Oral presentation.
arxiv / project page / visual results /

We design a unified system for novel view synthesis which leverages soft layering and depth-aware inpainting to achieve state-of-the-art results on multiple view synthesis datasets. We leverage the soft layering to incorporate matting, which allows the incorporation of intricate details to synthesized views.





Built with the Jekyll fork by Leonid Keselman of Jon Barron's website.