Talking Papers Podcast

CLIPasso - Yael Vinker

March 13, 2023 Yael Vinker Season 1 Episode 19
Talking Papers Podcast
CLIPasso - Yael Vinker
Show Notes Chapter Markers

In this episode of the Talking Papers Podcast, I hosted Yael Vinker. We had a great chat about her paper "CLIPasso: SEmantically-Aware Object Sketching”, SIGGRAPH 2022 best paper award winner. 

In this paper, they convert images into sketches with different levels of abstraction. They avoid the need for sketch datasets by using the well-known CLIP model to distil the semantic concepts from sketches and images. There is no network training here, just optimizing the control points of Bezier curves to model the sketch strokes (initialized by a saliency map). How is this differentiable? They use a differentiable rasterizer. The degree of abstraction is controlled by the number of strokes. Don't miss the amazing demo they created.

Yael is currently a PhD student at Tel Aviv University. Her research focus is on computer vision, machine learning, and computer graphics with a unique twist of combining art and technology. This work was done as part of her internship at EPFL


Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir


 Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distil semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.


📚CLIP: Connecting Text and Images

📚Differentiable Vector Graphics Rasterization for Editing and Learning


📚 Paper

💻Project page


This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.



If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email:

🎧Subscribe on your favourite podcast app:

📧Subscribe to our mailing list:

🐦Follow us on Twitter:

🎥YouTube Channel:

CLIPasso with Yael Vinker
Related work
What did reviewer 2 say?