CVPR 2019 Tutorial on Unifying Human Activity Understanding

Introduction

In recent years, the potential for groundbreaking work in the video understanding community has grown dramatically. Similarly to other AI fields, video data has become abundant through surveillance, social media, and even robotics. However, compared to parallel opportunities in image understanding, the video community has seen fewer breakthroughs. Why is that?
In this tutorial, we aim to both provide an introduction and organization of the many research directions in the video analysis field, as well as provide a single open framework, PyVideoResearch, for comparing and sharing video algorithms. We encourage both new and veteran video researchers to utilize the framework, and we hope this helps to unify the various silos producing independent video algorithms on their own datasets.

We provide a systematic analysis of the various areas within video analysis, where we identify a taxonomy of tasks and benchmarks in video research. We explore both the fundamental algorithms that have become dominant in recent years, and analyze the drawbacks and strengths of those algorithms and datasets.
We propose a new framework, PyVideoResearch in PyTorch, that contains a variety of state-of-the-art video analysis tools, datasets, and tasks. We provide an overview of the features, the design choices, and the ecosystem we hope to create. This makes it easy to try different combinations of different algorithms on different datasets, and ensures that all benefit from the same engineering tricks.
We will hear talks and discussions from prominent researchers in the video community on how they think about the current important directions and how to move forward.

Invited Speakers

Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is a Senior Research Scientist at the Stanford AI Lab and Associate Director of Research at the Stanford-Toyota Center for AI Research since 2015. He is also an Associate Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) since 2011. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).

Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on "Local Greyvalue Invariants for Image Matching and Retrieval" received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled "From Image Matching to Learning Visual Models". Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996--1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs an INRIA team. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), editor-in-chief for IJCV (2013---), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017.

Abhinav Gupta is an Associate Professor at the Robotics Institute, Carnegie Mellon University. and Research Manager at Facebook AI Research (FAIR). Abhinav’s research focuses on scaling up learning by building self-supervised, lifelong and interactive learning systems. Specifically, he is interested in how self-supervised systems can effectively use data to learn visual representation, common sense and representation for actions in robots. Abhinav is a recipient of several awards including ONR Young Investigator Award, PAMI Young Research Award, Sloan Research Fellowship, Okawa Foundation Grant, Bosch Young Faculty Fellowship, YPO Fellowship, IJCAI Early Career Spotlight, ICRA Best Student Paper award, and the ECCV Best Paper Runner-up Award. His research has also been featured in Newsweek, BBC, Wall Street Journal, Wired and Slashdot.

Schedule

1:00pm	Welcome by Gunnar Sigurdsson & Michael Ryoo
1:10pm	Gunnar Sigurdsson: PyVideoResearch, A New Unifying Framework for Reproducing Video Algorithms
1:40pm	Michael Ryoo: Neural Architecture Search for Video CNNs
2:10pm	Juan Carlos Niebles: Invited Talk - Human Event Understanding: From Actions to Tasks
2:40pm	Cordelia Schmid: Invited Talk
3:15pm	Coffee Break
3:40pm	Abhinav Gupta: Invited Talk - What is Missing in Video Understanding: Datasets, Representation, and Tasks
4:10pm	Panel Discussion: Emerging Topics and Future Challenges in Videos
	Gunnar Sigurdsson, Michael Ryoo, Carl Vondrick, Federico Perazzi, Georgia Gkioxari, Joao Carreira