10-22
Enabling Large-Scale Data Intensive Computations

This talk describes a set of distributed services developed at Microsoft Research Silicon Valley to enable efficient parallel programming on very large datasets. Parallel programs arise naturally within scientific, data mining, and business applications. Central to our philosophy is the notion that parallel programs do not have to be difficult to write and that the same program must seamlessly run on a laptop, desktop, a small cluster, or on a large data center without the author having to worry about the details of parallelization, synchronization, or fault-tolerance. We have built several services (Dryad, DryadLINQ, TidyFS, and Nectar) that embody this belief. Our goal is to enable users, particularly scientists of all disciplines, to treat a computer cluster as a forensic, diagnostic, or analytic tool. The talk will describe the details of our infrastructure and the characteristics of some of the applications that have been run on it.

Chandu Thekkath is a researcher at Microsoft Research Silicon Valley. He received his Ph.D. in Computer Science from the University of Washington in 1994. Since then, except for a sabbatical year at Stanford in 2000, he has been in industrial research labs at DEC, Compaq, and Microsoft. He is a fellow of the ACM.

Date and Time
Friday October 22, 2010 1:30pm - 3:00pm
Location
Computer Science Small Auditorium (Room 105)
Host
Edward Felten

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List