Qiang Ji received his Ph - Nanyang Technological University

advertisement
Speaker: Prof. Qiang Ji, Department of Electrical, Computer, and Systems Engineering
at Rensselaer Polytechnic Institute (RPI).
Title: Complex Activity Modeling and Recognition with the Interval Temporal
Bayesian Networks.
Complex activities typically consist of multiple primitive events happening in parallel or
sequentially over a period of time. Understanding such activities requires recognizing not
only each individual event but, more importantly, capturing their spatiotemporal
dependencies over different time intervals. The current graphical model-based
approaches are mostly based on points of time and they hence can only capture three
temporal relations: precedes, follows, and equals. The existing syntactic and descriptionbased methods, while rich in modeling temporal relationships, do not have the expressive
power to capture uncertainties. To address these issues, we introduce the Interval
Temporal Bayesian Network (ITBN), a novel graphical model that combines the
Bayesian Network with the Interval Algebra to explicitly model a large variety of
temporal dependencies, while remaining fully probabilistic and expressive of uncertainty.
Advanced machine learning methods are introduced to learn the ITBN model structure
and parameters. Experimental results on benchmark real videos show that by reasoning
with spatiotemporal dependencies, ITBN can significantly outperform state of art
dynamic models in recognizing complex activities.
Bio:
Qiang Ji received his Ph.D. degree in Electrical Engineering from the University of
Washington. He is currently a Professor with the Department of Electrical, Computer,
and Systems Engineering at Rensselaer Polytechnic Institute (RPI). He recently served as
a program director at the National Science Foundation (NSF), where he managed NSF’s
computer vision and machine learning programs. He also held teaching and research
positions with the Beckman Institute at University of Illinois at Urbana-Champaign, the
Robotics Institute at Carnegie Mellon University, the Department of Computer Science at
University of Nevada at Reno, and the US Air Force Research Laboratory. Prof. Ji
currently serves as the director of the Intelligent Systems Laboratory (ISL) at RPI.
Prof. Ji’s research interests are in computer vision, probabilistic graphical models,
information fusion, and their applications in various fields. He has published over 190
papers in peer-reviewed journals and conferences. His research has been supported by
major governmental agencies including NSF, NIH, DARPA, ONR, ARO, and AFOSR as
well as by major companies including Honda and Boeing. Prof. Ji is an editor on several
related IEEE and international journals and he has served as a general chair, program
chair, technical area chair, and a program committee member in numerous international
conferences/workshops. Prof. Ji is a fellow of IAPR.
Speaker: Prof. Jason Corso, Department of Computer Science and Engineering at
SUNY, Buffalo.
Title: Can Language Play a Role in Large Scale Video Search?
Large scale video search and mining is dominated by low-level features and classifiers.
Although these methods have demonstrated strong promise in various problems like
video search based on event type, they have limited ability to facilitate rich, semantic
queries. In contrast, if the underlying video representation is at a higher-level, say
attributes or language, such rich queries may be more plausible. To that end, I will
discuss my recent work in jointly modeling video and language and converting video into
language, in an effort to motivate a semantically rich large scale video search.
Bio:
Corso is an associate professor in the Computer Science and Engineering Department of
SUNY at Buffalo. He received his Ph.D. in Computer Science at The Johns Hopkins
University in 2005. From 2005-2007, Corso was a post-doctoral research fellow in neuroimaging and statistics at the University of California, Los Angeles. He is the recipient of
the Army Research Office Young Investigator Award 2010, NSF CAREER award 2009,
SUNY Buffalo Young Investigator Award 2011, a member of the 2009 DARPA
Computer Science Study Group, and a recipient of the Link Foundation Fellowship in
Advanced Simulation and Training 2004. He holds the Associate Editor position of
Computer Methods and Programs in Biomedicine since 2009. Corso has authored more
than eighty papers on topics of his research interest including computer vision, robot
perception, data mining and medical imaging. He is PI on more than $5 million in
research funding from major federal agencies, including NSF, NIH, DARPA, ARO, and
IARPA.
Speaker: Dr. Josef Sivic, INRIA / Ecole Normale Supérieure, Paris, France.
Title: Towards Mid-level Representations of Video.
In this talk I will describe our recent work towards developing mid-level representations
of video. First, I will discuss a joint model of actors and actions in movies that can
localize individual actors in video and recognize their action. The model is learnt from
only a weak textual supervision provided by the movie shooting script. We validate the
model in the challenging setting of localizing and recognizing characters and their actions
in feature length movies Casablanca and American Beauty. Second, motivated by the
increasing availability of 3D films we develop a mid-level representation of stereoscopic
video that combines person detection, pose estimation and pixel-wise segmentation of
multiple people in video. We formulate the problem as an energy minimization that
explicitly models depth ordering and occlusion of people. We demonstrate results on
challenging indoor and outdoor scenes from 3D feature length movies Street Dance and
Pina. Finally, we investigate a transfer learning approach based on convolutional neural
networks (CNNs). We demonstrate that a mid-level image representation learnt using a
CNN on a task with a large amount of fully labelled image data (ImageNet) can
significantly improve visual recognition performance on related tasks where supervision
is limited. The proposed method achieves state-of-the-art results on the Pascal VOC
object classification and (still image) action recognition challenge. Applying the model to
video seems within reach. Joint work with: K. Alahari, F. Bach, P. Bojanowski, L.
Bottou, J. Ponce, I. Laptev, M. Oquab, G. Seguin and C. Schmid.
Bio:
Josef Sivic received a degree from the Czech Technical University, Prague, in 2002 and
the PhD degree from the University of Oxford in 2006. His thesis dealing with efficient
visual search of images and videos was awarded the British Machine Vision Association
2007 Sullivan Thesis Prize and was short listed for the British Computer Society 2007
Distinguished Dissertation Award. His research interests include visual search and object
recognition applied to large image and video collections. After spending six months as a
postdoctoral researcher in the Computer Science and Artificial Intelligence Laboratory at
the Massachusetts Institute of Technology, he currently holds a permanent position as an
INRIA researcher at the Departement d’Informatique, Ecole Normale Superieure, Paris.
He has published over 40 scientific publications and serves as an Associate Editor for the
International Journal of Computer Vision. He has been awarded an ERC Starting grant in
2013.
Speaker: Prof. Jiang Yu-Gang, Fudan University, China.
Title: Recognizing actions and complex events in unconstrained videos.
Nowadays people produce a huge number of videos; many are uploaded to the Internet on
social media sites such as YouTube and Vimeo. There is a strong need to develop
automatic solutions for recognizing the contents of these videos. Potential applications of
such techniques include effective video content management and retrieval, open-source
intelligence analysis, etc. In this talk, I will introduce our recent works on human action
and event recognition. I will start by introducing a recently constructed Internet consumer
video dataset, on which we measure human recognition performance of video events and
compare this with popular automatic machine recognition solutions. After that I will
introduce an approach to construct effective features for human action recognition.
Finally I will discuss the speed efficiency of popular techniques and suggest componentlevel options for “real-time” recognition.
Bio:
Yu-Gang Jiang is an associate professor in the School of Computer Science at Fudan
University, Shanghai. He directs Lab for Big Visual Data Analytics, working on
problems related to large scale image and video data analysis sponsored broadly by both
government agencies and industrial partners. He is an active participant of several
international benchmark evaluations, and is one of the task organizers of the annual
European MediaEval evaluations. At the U.S. NIST TREC video retrieval evaluation,
systems designed by him achieved top performance in 2008 video concept detection task
and 2010 multimedia event detection task. His work has led to a best demo award from
ACM Hong Kong (2009), the second prize of ACM Multimedia Grand Challenge (2011),
a recognition by IBM Watson Research as an "emerging leader in multimedia" (2009),
and an award from Intel to outstanding young CS faculties in China (2013). He is a guest
editor for IEEE Transactions on Multimedia's special issue on Socio-Mobile Media
Analysis and Retrieval, and Machine Vision and Applications' special issue on
Multimedia Event Detection. He is program co-chair of ICCV 2013 workshop on Action
Recognition with a Large Number of Classes. He will serve as a Program Chair for ACM
ICMR 2015. He graduated from City University of Hong Kong with a PhD in Computer
Science. Before Fudan, he was a postdoctoral research scientist at Columbia University,
New York.
Speaker: Prof. Junsong Yuan, Nanyang Technological University, Singapore.
Title: Discovering Visual Patterns in Video Data
Motivated by the previous success in mining structured data (e.g., transaction data) and
semi-structured data (e.g., text), it has aroused our curiosity to discover meaningful
patterns in more complex data like images and videos. However, unlike transaction and
text data that are composed of discrete elements without much ambiguity (i.e. predefined
items and vocabularies), visual patterns generally exhibit large variabilities in their visual
appearances, thus challenge existing data mining and pattern discovery algorithms. This
talk will discuss my recent work of discovering visual patterns in videos, as well as its
applications in video scene understanding, summarization and anomaly detection.
Bio:
Junsong Yuan received his Ph.D. from Northwestern University. He is currently a
Nanyang Assistant Professor at Nanyang Technological University (NTU), Singapore,
leading the video analytics program at School of EEE. His PhD thesis “Mining Image
and Video Data” received the Outstanding EECS Ph.D. Thesis award from Northwestern
University. He also received the Best Doctoral Spotlight Award from IEEE Conf.
Computer Vision and Pattern Recognition Conference (CVPR'09). He co-chairs
workshops at CVPR'12'13 and ICCV’13, and serves as Area Chair for IEEE Winter Conf.
on Computer Vision (WACV'14) and IEEE Conf. on Multimedia Expo (ICME'14). He is
Organizing Chair and Area Chair for Asian Conf. on Computer Vision (ACCV'14). He
recently gives tutorials at IEEE ICIP'13, FG'13, ICME'12, SIGGRAPH VRCAI'12, and
PCM'12.
Download