Towards Highly Reliable and Scalable Distributed Systems (thesis)
Report ID: TR-772-07Author: Park, KyoungSoo
Date: 2007-01-00
Pages: 148
Download Formats: |PDF|
Abstract:
Over the past decades, we have observed the development of a number of
Internet-scale distributed systems such as the Domain Name System
(DNS) and Content Distribution Networks (CDNs), which support the
Internet access of millions of users. Though they have become
indispensable infrastructure in the Internet, their operating dynamics
have not been well studied so far. This dissertation focuses on the
reliability and scalability of such large-scale distributed systems
and explores the design principles for highly available and scalable
systems.
One key principle for highly reliable services is distributed
management of available resources based on autonomous monitoring. We
implement CoDeeN, a latency-sensitive public CDN service with purely
decentralized control, and demonstrate that its service reliability is
greatly improved using careful resource monitoring, even in a highly
unreliable environment. CoDeeN has been operating over three years,
handling 5.8+ billion successful HTTP requests and serving over 14
million users, and has been one of the most stable long-running
services on PlanetLab.
The second principle is that intelligent composition of temporarily
unreliable resources can provide better reliability than any of the
underlying resources. Using this principle, we build CoDNS, which has
failure rates that range from one-tenth to one-hundredth of that of
the existing DNS services. By aggregating the unreliable services,
CoDNS improves availability by an extra '9', from 99\% to over 99.9\%,
and in some cases achieves over 99.99\% or less than 8 seconds of
downtime per day. The utility of this service has also been proved in
practice by providing more predictable and reliable name lookup
service to the CoDeeN CDN service.
Finally, we find that the scalability of a large-scale distributed
system can be immensely improved by independent and asynchronous node
peering strategy and effective request distribution. CoBlitz, a
scalable large-file transfer CDN atop CoDeeN and CoDNS, achieves
downloading performance 27-48\% higher than BitTorrent, while reducing
the origin load by a factor of 7 more than the previously best known
research system.