Talking Papers Podcast

MagicPony - Tomas Jakab

Itzik Ben-Shabat Season 1 Episode 25

Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Tomas Jakab to discuss his paper "MagicPony: Learning Articulated 3D Animals in the Wild" which was published at CVPR 2023.

The motivation behind the MagicPony methodology stems from the challenge posed by the scarcity of labeled data, particularly when dealing with real-world scenarios involving freely moving articulated 3D animals. In response, the authors propose an innovative solution that addresses this issue. This novel approach takes an ordinary RGB image as input and produces a sophisticated 3D model with detailed shape, texture, and lighting characteristics. The method's uniqueness lies in its ability to learn from diverse images captured in natural settings, effectively deciphering the inherent differences between them. This enables the system to establish a foundational average shape while accounting for specific deformations that vary from instance to instance. To achieve this, the researchers blend the strengths of two techniques, radiance fields and meshes, which together contribute to the comprehensive representation of the object's attributes. Additionally, the method employs a strategic viewpoint sampling technique to enhance computational speed. While the current results may not be suitable for practical applications just yet, this endeavor constitutes a substantial advancement in the field, as demonstrated by the tangible improvements showcased both quantitatively and qualitatively.


AUTHORS
Shangzhe Wu*, Ruining Li*, Tomas Jakab*, Christian Rupprecht, Andrea Vedaldi

ABSTRACT
We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.

RELATED PAPERS
📚CMR
📚Deep Marching Tetrahedra
📚DINO-ViT

LINKS AND RESOURCES

📚 Paper
💻 Project page
💻 Code


CONTACT

If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com


All links are available in the blog post: https://www.itzikbs.com/magicpony

🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

📧Subscribe to our mailing list: http://eepurl.com/hRznqb

🐦Follow us on Twitter: https://twitter.com/talking_papers

🎥YouTube Channel: https://bit.ly/3eQOgwP

People on this episode