Kyle Sargent
I am a PhD student at Stanford, beginning in fall 2022, working in the Stanford AI Lab. I am advised by Jiajun Wu and Fei-Fei Li.
I have also worked at Google Research as a student researcher, mentored by Deqing Sun and Charles Herrmann.
Previously, I was an AI resident at Google Research. Before joining Google, I received my undergraduate degree from Harvard, where I studied CS and math.
Email /
GitHub /
Google Scholar /
LinkedIn /
Misc
|
|
Research
I work in computer vision. My main areas of focus are 3D reconstruction, novel view synthesis, and 3D generative models.
|
|
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu
CVPR, 2024.
arxiv /
project page /
We train a 3D-aware diffusion model, ZeroNVS on a mixture of scene data sources that capture object-centric, indoor, and outdoor scenes. This enables zero-shot SDS distillation of 360-degree NeRF scenes from a single image. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting. We also use the MipNeRF-360 dataset as a benchmark for single-image NVS.
|
|
NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Varun Jampani*, Kevis-Kokitsi Maninis*, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, Andre Araujo, Ricardo Martin-Brualla, Kaushal Patel, Daniel Vlasic, Vittorio Ferrari, Ameesh Makadia, Ce Liu, Yuanzhen Li, Howard Zhou
NeurIPS, 2023.
arxiv /
project page /
We propose “NAVI”: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters.
|
|
VQ3D: Learning a 3D Generative Model on ImageNet
Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun
ICCV, 2023. Oral Presentation. Best paper finalist.
arxiv /
project page /
VQ3D introduces a 3D-aware NeRF-based decoder to the 2-stage ViT-VQGAN. Then, Stage 1 allows for novel view synthesis from input images, and Stage 2 allows for generation of totally new 3D-aware images. We achieve an ImageNet FID of 16.8, compared to 69.8 for the best baseline.
|
|
Self-supervised AutoFlow
Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun
CVPR, 2023.
arxiv /
project page /
We introduce self-supervised AutoFlow to handle real-world videos without ground truth labels, using self-supervised loss as the search metric.
|
|
Pyramid Adversarial Training Improves ViT Performance
Kyle Sargent,* Charles Herrmann,* Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun (*equal contribution)
CVPR, 2022. Oral presentation. Best paper finalist.
arxiv /
Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). We propose “Pyramid Adversarial Training,” a strong adversarial augmentation which perturbs images at multiple scales during training. We achieve a new state of the art on ImageNet-C, ImageNet-Rendition, and ImageNet-Sketch using only our augmentation and the standard ViT-B/16 backbone.
|
|
SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
V. Jampani*, H. Chang*, K. Sargent, A. Kar, R. Tucker, M. Krainin, D. Kaeser, W. T. Freeman, D. Salesin, B. Curless, C. Liu (*equal contribution)
International Conference on Computer Vision, ICCV, 2021. Oral presentation.
arxiv /
project page /
visual results /
We design a unified system for novel view synthesis which leverages soft layering and depth-aware inpainting to achieve state-of-the-art results on multiple view synthesis datasets. We leverage the soft layering to incorporate matting, which allows the incorporation of intricate details to synthesized views.
|
|