Caching in Large-Scale Distributed File Systems (Thesis)

Report ID: TR-397-92
Author: Blaze, Matt
Date: 1992-12-00
Pages: 89
Download Formats: |Postscript|
Abstract:

This thesis examines the problem of cache organization for very large-scale distributed file systems (DFSs). Conventional DFSs, based on the client--server model, suffer from bottlenecks when the total client load exceeds the server's capacity. Previous work has suggested that hierarchical client organizations can ameliorate the problem somewhat, but at the expense of a substantial increase in client latency. An analysis of existing DFS workloads reveals that there is considerable regularity in client file access patterns and that widely shared files lend themselves especially well to caching techniques. In particular, a large proportion of ``cache miss'' traffic is for files that are already copied in another client's cache. If clients can share these cached files, the server's load can be reduced by a potentially large margin, making larger-scale systems possible. We introduce the notion of {em dynamic hierarchical caching}, in which adaptive client hierarchies are constructed on a file - by - file basis. Trace - driven simulation and workload - driven runs of a prototype file system suggest that dynamic hierarchies can reduce server load substantially without the client performance penalties associated with more static schemes.