With Moore's Law coming to an end, architects must find ways to sustain performance growth without technology scaling. The most promising path is to build highly parallel systems that harness thousands of simple and efficient cores. But this approach will require new techniques to make massive parallelism practical, as current multicores fall short of this goal: they squander most of the parallelism available in applications and are too hard to program.
I will present Swarm, a new architecture that successfully parallelizes algorithms that are often considered sequential and is much easier to program than conventional multicores. Swarm programs consist of tiny tasks, as small as tens of instructions each. Parallelism is implicit: all tasks follow a programmer-specified total or partial order, eliminating the correctness pitfalls of explicit synchronization (e.g., deadlock, data races, etc.). To scale, Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover enough parallelism.
Swarm builds on decades of work on speculative architectures and contributes new techniques to scale to large core counts, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered task commits. Swarm also incorporates new techniques to exploit locality and to harness nested parallelism, making parallel algorithms easy to compose and uncovering abundant parallelism in large applications.
Swarm accelerates challenging irregular applications from a broad set of domains, including graph analytics, machine learning, simulation, and databases. At 256 cores, Swarm is 53-561x faster than a single-core system, and outperforms state-of-the-art software-only parallel algorithms by one to two orders of magnitude. Besides achieving near-linear scalability, the resulting Swarm programs are almost as simple as their sequential counterparts, as they do not use explicit synchronization.
Bio:
Daniel Sanchez is an Associate Professor of Electrical Engineering and Computer Science at MIT. His research interests include parallel computer systems, scalable and efficient memory hierarchies, architectural support for parallelization, and architectures with quality-of-service guarantees. He earned a Ph.D. in Electrical Engineering from Stanford University in 2012 and received the NSF CAREER award in 2015.