Characterizing and Removing Branch Mispredictions (Thesis)

Report ID: TR-604-99
Author: Skadron, Kevin
Date: 1999-06-00
Pages: 229
Download Formats: |Postscript|
Abstract:

Control-flow mispredictions are a profound impediment to processor performance, because each misprediction introduces a pipeline bubble of many cycles' duration. For example, the minimum bubble in the recently released Alpha 21264 is at least seven cycles, and often as much as twenty cycles. With such long penalties, even small misprediction rates harm performance substantially.

Although a huge number of techniques have been proposed to combat this problem, most have focused on only one type of misprediction: conflicts in the predictor's state tables.

This thesis describes a taxonomy of misprediction types and presents data showing that several other types of mispredictions are just as important as conflicts. Techniques to attack three of these misprediction types are then described. Alloying makes both local and global history available in a single branch predictor structure, providing robust performance compared against both conventional two-level predictors and against hybrid predictors. Speculative update with fixup ensures that the predictor's state remains up-to-date, while protecting the state against corruption from mispredicted branches. Finally, multipath execution simultaneously executes both sides of a branch, eliminating mispredictions for those branches that are otherwise difficult to predict.

This work also explores tradeoffs among branch prediction, instruction window size, data cache size, and instruction cache size, and shows that branch prediction is the most powerful lever on processor performance.

This thesis makes one further contribution, demonstrating that long benchmarks can be simulated efficiently by simulating only a small but carefully chosen 50 million instruction segment of the program's overall execution. This technique avoids unrepresentative initial phases of a program and sets the simulation window to capture behavior that is representative in terms of the program's branch prediction accuracy, cache behavior, and overall execution speed.