Network Interface Support for Shared Virtual Memory on Clusters

Report ID: TR-579-98
Author: Liao, Cheng / Bilas, Angelos / Singh, Jaswinder Pal
Date: 1998-03-00
Pages: 11
Download Formats: |Postscript|
Abstract:

Clusters of symmetric multiprocessors (SMPs) are important platforms for high performance computing. A lot of research has been done in building network interconnects and communication layers that deliver low-latency and high-bandwidth communication to the user. With the success of hardware cache-coherent distributed shared memory (DSM), a lot of effort has also been made to support the coherent shared address space programming model in software on clusters. However, performance is still far from that achieved on hardware DSM systems.

In this work we investigate the use of mechanisms in the software communication layer and the underlying network interface to substantially enhance the performance of shared virtual memory (SVM) on clusters of SMPs. We use a real implementation with a programmable network interface as our prototype, but our extensions are general-purpose and can be provided by network interfaces that do not employ a programmable processor.

We examine how the protocol layer can take advantage of each mechanism in the communication layer and be restructured accordingly. The final protocol (SVM-NI) eliminates the need for interrupts and asynchronous protocol handling. For each mechanism, we evaluate the impact on the end performance of ten applications with widely varying characteristics. We demonstrate that substantial improvements in performance can indeed be achieved, and find that different applications need different mechanisms among the ones we use. Application performance improves up to 50\% for applications that end up with reasonably good speedups; individual components of execution time targeted by each mechanism are reduced by even higher percentages. Finally, we use a firmware performance monitor, integrated with the communication layer, to understand the drawbacks of the system, to identify interesting tradeoffs in the protocol layer for future exploration, and to identify the remaining bottlenecks in SVM performance that should be addressed next.