Making the ``Box'' Transparent: System Call Performance as a First-class Result
Report ID: TR-670-03Author: Pai, Vivek / Ruan, Yaoping
Date: 2003-06-00
Pages: 14
Download Formats: |PDF| |Postscript|
Abstract:
For applications that make heavy use of the operating system, the ability of designers to understand system call performance behavior may be essential to achieving high performance. Conventional approaches to performance analysis, such as monitoring tools and profilers, collect and present their information off-line or via out-of-band channels. We believe that making this information first-class and exposing it to running applications via in-band channels on a per-call basis presents opportunities for analysis and performance tuning not available via other mechanisms. Furthermore, our approach provides direct feedback to applications on time spent in the kernel, resource contention, and time spent blocked, allowing them to immediately observe how the application and workload affect kernel behavior. Not only does this approach provide greater transparency into the workings of the kernel, but it also allows applications to control how performance information is collected, filtered, and correlated with application-level events.
To demonstrate the power of this approach, we show that our implementation, DeBox, obtains precise information about OS behavior at low cost, and that it can be used in debugging/tuning application performance on complex workloads. In particular, we focus on the industry-standard SpecWeb99 benchmark running on the Flash Web Server. Using DeBox, we are able to diagnose a series of problematic interactions between the server and the operating system. Addressing these issues as well as other optimization opportunities generates an overall factor of four improvement in our SpecWeb99 score and throughput gains on other benchmarks. Equally importantly, our measurements suggest that parallelism stemming from programmer convenience has a sharply negative impact on latency. We show how our optimizations reduce this impact, improving latency from a factor of 4 to 47 under different conditions.