Tracking is hard, because it is important to track relatively small structures that can move relatively fast, for example,lower arms. I will describe research into kinematic tracking, tracking that reports the kinematic configuration of the body, that has resulted in a fairly accurate, fully automatic tracker, that can keep track of multiple people.
Once one has tracked the body, one must interpret the results. One way to do so is to have a motion synthesis system that takes the track, and produces a motion that is (a) like a human motion and (b) close to the track. Our work has produced a high-quality motion synthesis system that can produce motions that look very much like human activities. I will describe work that couples that system with a tracker to produce a description of the activities, entirely automatically.
It is often very difficult to obtain a rich set of examples of activities, because activities appear to compose in complex ways. I will describe methods to extract a representation of activity that can be searched with a text-like query.
I will speculate on some of the many open problems. What should one report? how do nearby objects affect one's interpretation of activities? how can one interpret patterns of behaviour?
We describe methods to determine what people are doing in a video sequence.