Many autonomous systems such as self-driving cars, unmanned aerial vehicles, and personalized robotic assistants are inherently complex. In order to deal with this complexity, practitioners are increasingly turning towards data-driven learning techniques such as reinforcement learning (RL) for designing sophisticated control policies. However, there are currently two fundamental issues that limit the widespread deployment RL: sample inefficiency and the lack of formal safety guarantees. In this talk, I will propose solutions for both these issues in the context of continuous control tasks. In particular, I will show that in the widely applicable setting where the dynamics are linear, model-based algorithms which exploit this structure are substantially more sample efficient than model-free algorithms, such as the widely used policy gradient method. Furthermore, I will describe a new model-based algorithm which comes with provable safety guarantees and is computationally efficient, relying only on convex programming. I will conclude the talk by discussing the next steps towards safe and reliable deployment of reinforcement learning.
Bio:
Stephen Tu is a PhD student in Electrical Engineering and Computer Sciences at the University of California, Berkeley advised by Benjamin Recht. His research interests are in machine learning, control theory, optimization, and statistics. Recently, he has focused on providing safety and performance guarantees for reinforcement learning algorithms in continuous settings. He is supported by a Google PhD fellowship in machine learning.
Lunch for talk attendees will be available at 12:00pm.
To request accommodations for a disability, please contact Emily Lawrence, emilyl@cs.princeton.edu, 609-258-4624 at least one week prior to the event.