Scalable Management of Enterprise and Data-Center Networks

Report ID: TR-906-11
Author: Yu, Minlan
Date: 2011-08-00
Pages: 149
Download Formats: |PDF|
Abstract:

The networks in campuses, companies, and data centers are growing larger and becoming more complicated to manage. Today, network operators devote tremendous time and effort to three key management tasks --- routing, access control, and troubleshooting. Rather than trying to make today's brittle networks easier to manage, we focus on new network designs that are inherently easier to manage and scale to many hosts, switches, and applications.

We design and develop a new management system that scales the routing, access control, and performance diagnosis in enterprise and data center networks. The key challenges are the large number of hosts, switches, and applications in these networks and the need for flexible policies, while faced with strict memory and power constraints in the switches. To address these challenges, we propose three key ideas: (1) designing new data structures and algorithms that make effective use of limited memory in switches; (2) redirecting traffic when simple switches do not have enough memory to handle packets; (3) rethinking the division of labor among switches, hosts, and a centralized management system to make the network both flexible and scalable.

Based on the key ideas, we propose a new management system that addresses the scalability challenges of routing, supporting flexible policies, and performance diagnosis with three key components:

(i) BUFFALO: A scalable packet forwarding architecture that reduces the switch memory usage for storing the forwarding table using Bloom filters --- a compact way of representing a set of elements. To gracefully handle the false positives caused by Bloom filters, BUFFALO sends packets through a slightly longer path.

(ii) DIFANE: A scalable way to enforce flexible management policies from the centralized management system to the switches. DIFANE rethinks the division of labor between the centralized management system and the switches, by pulling some rule processing functions back to the switches, to achieve better scalability.

(iii) SNAP: A scalable network performance diagnosis architecture that exposes the interactions between the network and applications in data centers. SNAP passively logs traffic statistics in the end-host network stack and pinpoints problems that occur at the network device, network stack and the application software.

Our systems can be easily implemented with small modifications in today's switches and end hosts, as demonstrated by our prototypes built using the OpenFlow switches and Microsoft Windows servers, and our evaluation using configuration data from AT&T networks and a deployment in a production data center.