By Julia Schwarz
Computer scientists at Princeton have found a new technique for making deep neural networks faster, more energy efficient and cheaper to run.
The research, by Karthik Narasimhan, assistant professor of computer science, and doctoral students Vishvak Murahari, Carlos E. Jimenez and Runzhe Yang, makes it possible to run a deep neural network at least 10 times more efficiently while sacrificing only 2 percent accuracy. This new approach, called DataMUX, was recently awarded second place in the 2022 Bell Labs Prize with a $50,000 award.
In creating a deep neural network, there's a dichotomy between the size of the network required for training and the size required for a specific task. “Once you train this large network,” Narasimhan said, “it turns out that for any specific task, like sentiment analysis or question answering or image classification, you don't actually require such a large model.” And these large models consume a great deal of energy and computing power — a state-of-the-art AI language model could produce around 14,000 pounds of greenhouse gas in just three months, according to the researchers.
Greater efficiency will make AI more sustainable, but there is also a broader goal to “make AI cheaper and democratize it,” according to Narasimhan. “Only a few people now can have access to the best models because they're so huge and they require a lot of computing power to run, making them expensive to use. This could change that.”
Current solutions to making large networks more efficient after training focus on creating sparser or smaller neural networks. The DataMUX technique takes the opposite approach. It allows neural networks to process multiple text and image inputs simultaneously, making them denser. This allows the model to perform complex tasks with less computing power.
The researchers see broad applications for DataMUX. The model has been tested with inputs of images and text, but it could be used with any type of data — video, speech, genes — and in a variety of AI applications. “The most critical applications would be places where you want to reduce energy consumption or increase throughput, things like cloud APIs,” said Narasimhan.
How much more efficient DataMUX might make these models is the subject of future research. There is, according to the researchers, potential for exponential improvement. So far, the research has been done in models that are a tiny fraction of the size of a current state-of-the-art AI model. “As the model grows 100 times or 1000 times larger, we could see a corresponding speed up in efficiency,” Narasimhan said.
The method can also respond flexibly to changes in user traffic, potentially allowing an application to use a denser network when traffic is high and then ramp back down when traffic is low. “Dynamic load balancing is something current models that use pruning methods don't offer that easily,” said Murahari, first author on the paper. “You could potentially do some fancy engineering to make that happen, but this method naturally lends itself to load balancing.”
The paper, DataMUX: Data Multiplexing for Neural Networks was done with the support of Princeton University. The researchers will be presenting their work at the Conference and Workshop on Neural Information Processing Systems in New Orleans November 28-December 3, 2022.