10-15
Seeing, Saying, Doing, and Learning: Integrating Computer Vision, Natural Language Processing, Robotics, and Machine Learning Through Multidirectional Inference

[[{"fid":"614","view_mode":"landscape_left","fields":{"format":"landscape_left","field_file_image_alt_text[und][0][value]":"Jeff Siskind","field_file_image_title_text[und][0][value]":"","field_file_caption_credit[und][0][value]":"","field_file_caption_credit[und][0][format]":"full_html"},"type":"media","attributes":{"alt":"Jeff Siskind","height":293,"width":400,"class":"media-element file-landscape-left"},"link_text":null}]]The semantics of natural language can be grounded in perception and motor control with a unified cost function that supports multidirectional inference. I will present several instances of this approach.  The first is a cost function relating sentences, video, and a lexicon.  Performing inference from video and a lexicon to sentences allows it to generate sentential descriptions of video.  Performing inference from sentences and a lexicon to video allows it to search a video database for clips that match a sentential query. Performing inference from sentences and video to a lexicon allows it to learn a lexicon.  The second is the functional inverse of video captioning.  Instead of mapping video and object detections to sentences, one can map video and sentences to object detections.  This allows one to use sentential constraint on a video object codetection process to find objects without pretrained object detectors.  The third is a cost function relating sentences, robotic navigation paths, and a lexicon.  Performing inference from sentences and navigation paths to a lexicon allows it to learn a lexicon.  Performing inference from navigation paths and a learned lexicon to sentences allows it to generate sentential descriptions of paths driven by a mobile robot. Performing inference from sentences and a learned lexicon to navigation paths allows it to plan and drive navigation paths that satisfy a sentential navigation request.  Finally, one can perform object codetection on the video stream from a robot-mounted camera during navigation to satisfy sentential requests and use the collection of constraints from vision, language, and robotics to detect, localize, and label objects in the environment without any pretrained object detectors.

Joint work with Andrei Barbu, Daniel Paul Barrett, Scott Alan Bronikowski, N. Siddharth, and Haonan Yu.

Jeffrey M. Siskind received the B.A. degree in computer science from the Technion, Israel Institute of Technology, Haifa, in 1979, the S.M. degree in computer science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1989, and the Ph.D. degree in computer science from M.I.T. in 1992. He did a postdoctoral fellowship at the University of Pennsylvania Institute for Research in Cognitive Science from 1992 to 1993. He was an assistant professor at the University of Toronto Department of Computer Science from 1993 to 1995, a senior lecturer at the Technion Department of Electrical Engineering in 1996, a visiting assistant professor at the University of Vermont Department of Computer Science and Electrical Engineering from 1996 to 1997, and a research scientist at NEC Research Institute, Inc. from 1997 to 2001. He joined the Purdue University School of Electrical and Computer Engineering in 2002 where he is currently an associate professor. His research interests include computer vision, robotics, artificial intelligence, neuroscience, cognitive science, computational linguistics, child language acquisition, automatic differentiation, and programming languages and compilers.

Date and Time
Thursday October 15, 2015 12:30pm - 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Host
Christiane Fellbaum

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List