From single-box databases, data systems are evolving into multi-tenant compute and storage platforms that host not only structured data analytics but also AI workloads and AI-enhanced system components. The result of this evolution, which I call an “intelligent” data system, creates new opportunities and challenges for research and production at the intersection of machine learning and systems.
Key considerations in these systems include efficiency and cost, ML support and a flexible runtime for heterogeneous jobs. I will describe our work on query optimizers both for AI and aided by AI. For ML inference workloads over unstructured data, our optimizer injects proxy models for queries with complex predicates leading to a many-fold improvement in processing time; for query optimization in classic data analytics, our pre-trained models summarize structured datasets, answer cardinality estimation calls, and avoid the high training cost in recent instance-optimized database components. I will also describe our query processor and optimizer that enable and accelerate ML inference workflows on hybrid/IoT cloud. These efforts, combined with a few missing pieces that I will outline, contribute to better data systems where users can build, deploy, and optimize data analytics and AI applications with ease.
Bio: Yao Lu is a researcher at the Data Systems group, Microsoft Research Redmond. He works at the intersection of machine learning and data systems towards improved data and compute platforms for cloud machine learning, as well as using machine learning to improve current data platforms. He received his Ph.D. from the University of Washington in 2018.
To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.