Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Chengfenfg Xu to discuss his paper "NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection" which was published at ICCV2023.
In recent times, NeRF has gained widespread prominence, and the field of 3D detection has encountered well-recognized challenges. The principal contribution of this study lies in its ability to address the detection task while simultaneously training a NeRF model and enabling it to generalize to previously unobserved scenes. Although the computer vision community has been actively addressing various tasks related to images and point clouds for an extended period, it is particularly invigorating to witness the application of NeRF representation in tackling this specific challenge.
Chenfeng is currently a Ph.D. candidate at UC Berkeley, collaborating with Prof. Masayoshi Tomizuka and Prof. Kurt Keutzer. His affiliations include Berkeley DeepDrive (BDD) and Berkeley AI Research (BAIR), along with the MSC lab and PALLAS. His research endeavors revolve around enhancing computational and data efficiency in machine perception, with a primary focus on temporal-3D scenes and their downstream applications. He brings together traditionally separate approaches from geometric computing and deep learning to establish both theoretical frameworks and practical algorithms for temporal-3D representations. His work spans a wide range of applications, including autonomous driving, robotics, AR/VR, and consistently demonstrates remarkable efficiency through extensive experimentation. I am eagerly looking forward to see his upcoming research papers.
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka
NeRF-Det is a novel method for 3D detection with posed RGB images as input. Our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. We subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without per-scene optimization.
All links and resources are available on the blog post:
🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com
📧Subscribe to our mailing list: http://eepurl.com/hRznqb
🐦Follow us on Twitter: https://twitter.com/talking_papers
🎥YouTube Channel: https://bit.ly/3eQOgwP