DataX effort jumpstarts demonstration data science project at Princeton

News Body

March 7, 2019

The Office of Communications

Princeton University researchers will push the limits of data science by leveraging artificial intelligence and machine learning across the research spectrum in an interdisciplinary pilot project made possible through a major gift from Schmidt Futures.

The Schmidt DataX Fund will help advance the breadth and depth of data science impact on campus, accelerating discovery in three large, interdisciplinary research efforts and creating a suite of opportunities to educate, train, convene and support a broad data science community at the University. 

“The Schmidt DataX Fund will accelerate Princeton researchers’ use of artificial intelligence and machine learning to explore questions at the frontiers of human knowledge. These techniques are transforming the scholarly landscape, and I expect their importance will grow rapidly in the years ahead,” Princeton University President Christopher L. Eisgruber said. “I am deeply grateful to Eric Schmidt ’76, his wife, Wendy Schmidt, and Schmidt Futures, whose generosity will enable Princeton faculty and students to pursue innovative applications of data science to urgent questions of science, engineering and public policy.”

“This is a time when visionary leaders who apply new computing technologies and work together across departments and fields can make a huge difference,” said Eric Schmidt. “This new gift aims to build upon Princeton’s long record of research excellence by accelerating the application of the most modern computing, machine learning and artificial intelligence techniques to problems of greatest social and intellectual importance. DataX should multiply research outputs as well as produce many educated scientists and engineers.”

Broadening impact across campus

The Schmidt DataX Fund will be used to enhance the extent to which data science permeates discovery across campus and infuses machine learning and artificial intelligence into a range of disciplines. Many researchers and educators are eager to bring data science to their fields but lack the expertise, experience and tools.

The funds will support a range of campus-wide data science initiatives led by the Center for Statistics and Machine Learning, including: development of graduate-level courses in data science and machine learning; creation of mini-courses and workshops to train researchers in the latest software tools, cloud platforms and public data sets; and innovation funds to jump-start new research projects through funding for postdoctoral fellows, graduate or undergraduate students, and access to data sets and cloud resources.

Supporting innovative research

The funds will also support six Schmidt Data Scientists who will create and improve data-analysis software to operate at large scale, leading to faster discovery, wider impact and greater continuity. The Schmidt Data Scientists will be part of larger research teams comprising faculty, postdoctoral researchers, graduate students and other researchers in three areas: the Princeton Catalysis Initiative (PCI), the biomedical data science initiative and the Center for Information Technology Policy (CITP). The Schmidt Data Scientists will meet regularly as part of the Center for Statistics and Machine Learning to form a close-knit community, broaden their reach across campus and pursue professional development opportunities.

Catalysis
Catalysis will be the key technological driver for solutions to many problems of increasing social concern, including the development of alternative energy technologies, environmental remediation strategies, access to non-fossil-fuel-based and inexpensive pharmaceuticals and antibiotics, sustainable agriculture, and renewable soft materials. Working with Princeton’s chemists and other researchers, Schmidt Data Scientists will leverage machine learning in the discovery, optimization and application of catalytic reactions. They will build tools to make the enormous volumes of unpublished data more available and useful, and they will create user-friendly software to speed the adoption of data science to increase the pace of research in this field.

The Princeton Catalysis Initiative is led by a faculty committee including Abigail Doyle, the A. Barton Hepburn Professor of Chemistry, and David MacMillan, the James S. McDonnell Distinguished University Professor of Chemistry. One of the goals for the Schmidt Data Scientists working in PCI is to develop web-based user interfaces for the predictive software that Doyle’s lab recently created, which transforms the fundamental question of synthetic chemistry from “How do I make this?” to “What should be made and why?”

Biomedical data science
Genome-reading technologies have revolutionized the biomedical sciences, generating enormous quantities of genetic data — far too much to be easily useful. Schmidt Data Scientists will take steps to manage these data sets and navigate the protections that govern human data, providing an efficient, shared infrastructure to accelerate research in biomedical data science. Their contributions will range from building analysis pipelines to standardizing and normalizing large data sets, to developing interfaces and virtual machines that access and analyze data in the cloud, to optimizing software coming out of Princeton research groups to improve the scalability, robustness and usability of the software for external biomedical researchers.

The biomedical science initiative is spearheaded by the Department of Computer Science, with strong connections to the Lewis-Sigler Institute for Integrative Genomics, Princeton Neuroscience Institute and several engineering departments. Faculty members who apply machine learning to biomedicine include Ben Raphael, a professor of computer science who has used the approach to identify which pancreatic cancer cells could respond to targeted gene therapies, and Olga Troyanskaya, a professor of computer science and the Lewis-Sigler Institute for Integrative Genomics who developed software to characterize how genes are expressed in the tissues of C. elegans worms.

Information technology policy
Princeton research on information technology policy, driven by CITP and involving faculty from across the University, covers a broad range of data science topics. Areas of focus include security and privacy, bias and fairness, manipulation of data for social or political aims, and long-term workforce implications of advances in artificial intelligence. CITP is led by Ed Felten, the Robert E. Kahn Professor of Computer Science and Public Affairs, and it has 17 affiliated faculty members.

Schmidt Data Scientists working with these researchers will leverage and extend machine learning methods for many projects in the service of society, such as looking for deliberate manipulation campaigns on social media platforms, working with large employers to eliminate bias and increase fairness in hiring, and extending BlockSci — a leading tool for blockchain analytics — to apply to more systems.

Line graphic in the shape of a pyramid

Image by Kyle McKernan, Office of Communications