Scalable Management of Enterprise and Data-Center Networks
Report ID: TR-906-11Author: Yu, Minlan
Date: 2011-08-00
Pages: 149
Download Formats: |PDF|
Abstract:
The networks in campuses, companies, and data centers are growing
larger and becoming more complicated to manage. Today, network
operators devote
tremendous time and effort to three key management tasks ---
routing, access control, and troubleshooting. Rather than trying to make
today's brittle networks easier to manage, we focus on new network designs
that are inherently easier to manage and scale to many hosts, switches, and
applications.
We design and develop a new management system that scales the routing,
access control, and performance diagnosis in enterprise and data
center networks. The key challenges are the large number of hosts,
switches, and applications in these networks and the need for flexible
policies, while faced with strict memory and power constraints in the
switches. To address these challenges, we propose three key ideas: (1)
designing new data
structures and algorithms that make effective use of limited memory in
switches;
(2) redirecting traffic when simple switches do not have enough memory
to handle packets;
(3) rethinking the division of labor among switches, hosts, and a
centralized management system to make the network both flexible and
scalable.
Based on the key ideas, we propose a new management system that
addresses the scalability challenges of routing, supporting flexible
policies, and performance diagnosis with three key components:
(i) BUFFALO: A scalable packet forwarding architecture that reduces
the switch memory usage for storing the forwarding table using Bloom
filters --- a compact way of representing a set of elements. To
gracefully handle the false positives caused by Bloom filters, BUFFALO
sends packets through a slightly longer path.
(ii) DIFANE: A scalable way to enforce flexible management policies
from the centralized management system to the switches. DIFANE
rethinks the division of labor between the centralized management
system and the switches, by pulling some rule processing functions
back to the switches, to achieve better scalability.
(iii) SNAP: A scalable network performance diagnosis architecture that
exposes the interactions between the network and applications in data
centers.
SNAP passively logs traffic statistics in the
end-host network stack
and pinpoints problems that occur at the network device, network stack
and the application software.
Our systems can be easily implemented with small
modifications in today's switches and end hosts, as demonstrated by
our prototypes built using the OpenFlow switches and Microsoft Windows
servers, and our evaluation using configuration data from AT&T networks and a
deployment in a production data center.