Understanding and Improving Modern Web Traffic Caching
Report ID: TR-908-11Author: Ihm, Sunghwan
Date: 2011-08-00
Pages: 127
Download Formats: |PDF|
Abstract:
The World Wide Web is one of the most popular and important Internet
applications, and our daily lives heavily rely on it. Despite its
importance, the current Web access is still limited for two reasons:
(1) the Web has changed and grown significantly as social networking,
video streaming, and file hosting sites have become popular, requiring
more and more bandwidth, and (2) the need for Web access also has
grown, and many users in bandwidth-limited environments, such as
people in the developing world or mobile device users, still suffer
from poor Web access.
There was a burst of research a decade ago aimed at understanding the
nature of Web traffic and thus improving Web access, but
unfortunately, it has dropped off just as the Web has changed
significantly. As a result, we have little understanding of the
underlying nature of today’s Web traffic, and thus miss traffic
optimization opportunities for improving Web access. To help improve
Web access, this dissertation attempts to fill the missing gap between
previous research and today’s Web.
For a better understanding of today’s Web traffic, we first analyze
five years (2006- 2010) of real Web traffic from a
globally-distributed proxy system, which captures the browsing
behavior of over 70,000 users from 187 countries. Using this data set,
we examine major changes in Web traffic characteristics that occurred
during this period. We also develop a new Web page analysis technique
that is better suited for modern Web page interactions. Using our
analysis technique, we analyze various aspects of page-level changes,
and present a simple Web traffic model that we develop based on our
findings. Finally, we investigate the redundancy of this traffic,
using both traditional object-level caching as well as content-based
approaches that use the caching technique at the sub-object or packet
level. Among many findings, we observe a huge potential benefit of the
content-based caching approaches - the byte hit rate is almost twice
as large as that of the traditional object-level caching approach.
Motivated by the possible benefits from content-based caching
approaches, we also develop Wanax, a scalable and flexible wide-area
network (WAN) accelerator that is designed for low-bandwidth and
resource-limited developing world environments. It uses a novel
multi-resolution chunking (MRC) scheme that provides high compression
rates and high disk performance for a variety of content, while using
much less memory than existing approaches. Wanax exploits the design
of MRC to perform intelligent load shedding to maximize throughput
even when running on resource-limited shared platforms. Finally, Wanax
exploits mesh network environments, instead of just the star
topologies common in enterprise branch offices. Equally importantly,
the designs of Wanax can be applied to enterprise environments,
providing the same benefits.