"DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video"
Authors: Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould
This paper studies the task of temporal moment localization in a long untrimmed video using natural language query. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial subgraph that contextualized the scene representation using detected objects and human features. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCook II as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach
Cristian's page: https://crodriguezo.github.io/
"Proposal free temporal moment localization" : https://bit.ly/3EX1qCM
"Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs" : https://bit.ly/3zt4aXA
Subscribe to the podcast: https://talking.papers.podcast.itzikbs.com
Subscribe to our mailing list: http://eepurl.com/hRznqb
Follow us on Twitter: https://twitter.com/talking_papers
YouTube Channel: https://bit.ly/3eQOgwP
If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: email@example.com
Recorded on March, 26th 2021.
🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com
📧Subscribe to our mailing list: http://eepurl.com/hRznqb
🐦Follow us on Twitter: https://twitter.com/talking_papers
🎥YouTube Channel: https://bit.ly/3eQOgwP