03-03
Building Systems that Query on Compressed Data

[[{"fid":"668","view_mode":"embedded_left","fields":{"format":"embedded_left","field_file_image_alt_text[und][0][value]":"Rachit Agarwal","field_file_image_title_text[und][0][value]":"","field_file_caption_credit[und][0][value]":"","field_file_caption_credit[und][0][format]":"full_html"},"type":"media","attributes":{"alt":"Rachit Agarwal","height":250,"width":250,"class":"media-element file-embedded-left"},"link_text":null}]]Web services today want to support sophisticated queries, with stringent interactivity constraints. Many recent studies have argued that in-memory query execution is one of the keys to achieving query interactivity. However, as web services scale to larger data sizes, executing queries in memory becomes increasingly challenging. As a result, existing systems fall short of supporting sophisticated interactive queries at scale.

In this talk, we present Succinct, a distributed data store that supports functionality comparable to state-of-the-art NoSQL stores and yet, enables query interactivity for an order of magnitude larger data sizes than what is possible today (or, alternatively, up to two orders of magnitude faster queries at scale). Succinct accomplishes this by executing a wide range of queries -- e.g., search, range, and even regular expressions -- directly on compressed data. Succinct achieves scale by storing the input data in a compressed form, and interactivity by avoiding data scans and data decompression. We will also discuss how Succinct’s approach of executing queries on compressed data enables a new “lens” for exploring several classical systems problems -- e.g., failure recovery, load spikes during transient failures, skewed workloads, etc. --, and leads to previously unachievable operating points in the system design space. Succinct is open-sourced, and is already being adopted in production clusters of several large-scale web services.

Rachit Agarwal is a postdoc in AMPLab at UC Berkeley, where he leads the Succinct project along with Ion Stoica. His research focuses on the core problems in distributed data-intensive systems, with the goal of building systems that not only aim for practical impact but also have a strong theoretical foundation. He completed his PhD at UIUC, working with Brighten Godfrey and Matthew Caesar, and his undergraduate from IIT Kanpur. During his PhD, he received 2012 UIUC Rambus research award and 2010 Wang-Chung research award for outstanding performance in computer engineering research, and was listed in 2010 UIUC List of Teachers ranked as excellent.

Date and Time
Thursday March 3, 2016 12:30pm - 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Speaker
Host
Kyle Jamieson

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List