Speaker:
Speaker Link:
Institution:
Time:
Location:
One of the great, open challenges in machine vision is to train a
computer to "see people." A reliable solution opens up tremendous
possibilities, from automated persistent surveillance and
next-generation image search, to more intuitive computer interfaces.
It is difficult to analyze people, and objects in general, because
their appearance can vary due to a variety of "nuisance" factors
(including viewpoint, body pose, and clothing) and because real-world
images contain clutter. I will describe machine learning algorithms
that accomplish such tasks by encoding image statistics of the visual
world learned from large-scale training data. I will focus on
predictive models that produce rich, structured descriptions of images
and videos (How many people are present? What are they doing?) and
models that compensate for nuisance factors through the use of latent
variables. I will illustrate such approaches for the tasks of object
detection, people tracking, and activity recognition, producing
state-of-the-art systems as evidenced by recent benchmark
competitions.
