Estimating Application Hierarchical Bandwidth Requirements using BSP Family Models

Report ID: TR-875-10
Author: Singh, Jaswinder Pal / Soviani, Adrian
Date: 2010-05-00
Pages: 13
Download Formats: |PDF|
Abstract:

A vast amount of work has been done on developing programming models that provide good performance across machine architectures, are easy to use, and have predictable performance. Similarly, designing and optimizing architectures to achieve the best performance for an application class is a challenging task. Accurate cost modeling is essential to both application development and system design.

Many scientific computing codes are developed using libraries that provide custom-built collective communication primitives. The family of Bulk Synchronous Parallel (BSP) machine models are suitable tools for analyzing such problems.

However, modeling the effect of bandwidth limitations for globally unbalanced communication and estimating the hierarchical bandwidth used by applications remain key challenges.

We present a hierarchical bandwidth machine model (alphaDBSP) that naturally extends the Decomposable BSP (DBSP) model by associating a bandwidth growth factor $\alpha$ to each message pattern.

Algorithms executed on alphaDBSP have a runtime at least as good as DBSP. There are globally unbalanced problems for which alphaDBSP analysis is simpler or more accurate. E.g., the one element nearest-neighbor message exchange running on a pruned butterfly takes $O(log^3(p))$ on alphaDBSP and $O(\sqrt{p})$ on DBSP, while optimally modeling the one-to-all broadcast requires a single communication step on alphaDBSP vs. $O(log(p))$ steps on DBSP. We present three scientific computing kernels that illustrate differences between alphaDBSP and DBSP analysis.

Similar to BSP family models, alphaDBSP predicts collective communication execution time for a given machine. Additionally, alphaDBSP estimates the hierarchical bandwidth required by a given application. System architects may use this estimation to design machines that avoid bandwidth bottlenecks for their target application class.