Similarity Search with Multimodal Data

Report ID: TR-916-11
Author: Wang, Zhe
Date: 2011-11-00
Pages: 132
Download Formats: |PDF|
Abstract:

Similarity search systems are designed to help people to organize multimedia non-text data and find valuable information. The multimedia data intrinsically has multiple modalities (e.g., visual and audio features from video clips) which can be exploited to construct better search systems. Traditionally, various integration techniques have been used to aggregate multiple modalities. However, such algorithms do not scale well for large datasets. As the multimedia data grows, it is a challenge to build a search system to handle large-scale multimodal data efficiently and provide users with information they need.

The goal of this dissertation is to study how to effectively combine multiple modalities to implement similarity search systems for large datasets. I have carried out my study through three similarity search systems each designed for different application. Each system combines multiple modalities to help users find desired information quickly. With VFerret system, I studied how to combine visual features with audio features for effective personal video search. With Image Spam Detection System, I explored several aggregation methods to integrate multiple image spam filters to detect image spams. With my Product Navigation System, I studied how to combine text search with image similarity search to help user find desired products. This thesis has also studied a rank-based model which helps system designers to construct more efficient large-scale multimodal similarity search systems.

Although the general solution to using multimodal data in a similarity search system is still unknown, this dissertation shows that it is possible to substantially improve search accuracy and efficiency by leveraging domain specific knowledge of multimodal data. The VFerret system improves search accuracy from an average precision of 0.66 to 0.79 by combining visual and audio features. The Image Spam Detection System significantly lowers the false positive rate from a previous result of 1% to 0.001% while maintaining comparable detection rates by combining multiple image filters intelligently. My Product Navigation System reduces number of user clicks by 60% compared to traditional systems through a new method of combining text search with image similarity search. These results support further adoption and study of multimodal data in similarity search system designs.