Introduction:
The Internet is one of the greatest achievements of humanity. Within only 40 years from its creation, its impact is staggering. It has radically changed and continues to change our society, including how we communicate, work, learn, shop, and play. Despite its tremendous success, the Internet has some great problems. Most notably, it does not provide the level of security and reliability required by a critical infrastructure. In addition, the Internet does not incorporate sufficient mechanisms for privacy, manageability, mobility, and accountability. Although these problems have been researched extensively, they have not been solved yet, e.g., today the Internet is hardly any more secure or reliable than it was 10 or 20 years ago. The root cause of these problems and therefore one of the biggest problems of the Internet architecture is the inability to accommodate innovation in its fundamental architectural aspects and therefore address long-standing design limitations.
The core aspects of the Internet, e.g., the Internet Protocol (IP) and the inter-domain routing system, reflect design choices made in its early dates, when the Internet interconnected merely a small number of computers. Since then, the Internet has grown beyond any expectation into a global infrastructure that interconnects the entire world. However, the core protocols of the Internet remain relatively unchanged and although several limitations have been discovered since their initial design, it has proven extremely hard to change them. This architectural rigidity or ossification in the core aspects of the Internet is evident for example in the very long time (16 years so far) and intense efforts it has taken to adopt IPv6, which is a little more than a simple change in the header format and the addressing scheme of IPv4. Even after the exhaustion of the IPv4 address space, it remains unclear when (and even if!) IPv6 will gain considerable coverage. The ossification of the core architectural components of the Internet and the consequent persistence of its insecurity, unreliability, and other problems are grant challenges that need to be addressed to make any significant progress in improving the daily lives of the billions of Internet users across the globe.
The global inter-domain routing system, which is based on the Border Gateway Protocol (BGP), is one of the core components of the Internet architecture (along with IP, TCP, and DNS). It is the "glue" that holds an enormous decentralized ecosystem of thousands of autonomous networks together and keeps the Internet working in a maze of diverse financial interests. The inter-domain routing system is the most ossified component of the Internet architecture. This is because making a change requires global coordination between independent and competing Internet Service Providers (ISPs). ISPs are very conservative stakeholders, since their risks for malfunctions are higher than the rewards of new features. This is evident from the fact that other stakeholders, like application providers and OS/router/hardware vendors, support IPv6 since several years ago, while most ISPs do not.
In the proposed project, we focus on the major challenge of the ossification of the Internet routing architecture. This is the last frontier for addressing fundamental problems of the Internet architecture. Previous research falls into two categories. First, clean-slate approaches, e.g., [1–5], design new architectures from scratch without maintaining backward compatibility with the current Internet. Compared to these studies, our goal is to maintain backward compatibility with the current Internet. A second group of studies, e.g., [6–9], have proposed backward-compatible optimizations to BGP that can be realized within the current architecture, but provide only point solutions to specific problems. Compared to these studies, our goal is to make the Internet routing architecture more evolvable. This means that we want a routing architecture that can accommodate changes easily and frequently and, therefore, provides support for ongoing innovation. In addition, previous research has focused on the technical problems of the Internet routing architecture, without catering for a two-pronged approach that provides solid economic incentives for change. The rigidity of the Internet routing architecture is clearly a challenge not only of technical nature, but also of economical nature. For this reason, overcoming Internet routing ossification requires an inter-disciplinary approach that provides both technical and solid economical incentives to push stakeholders to adopt changes.
Our goal is to make the Internet routing system more evolvable and to address fundamental security, reliability, and manageability problems. In a nutshell, the proposed project has two main objectives:
- We will design a new Internet routing paradigm based on a novel techno-economic framework, which exploits emerging technologies and provides a viable scheme for evolving Internet routing, while preserving legacy compatibility with the current Internet architecture.
- Based on the introduced framework, we will design, build, and verify a better inter-domain routing system, which solves fundamental security, reliability, and manageability problems of the current Internet architecture.
Approach & Benefits:
We propose a new Internet routing paradigm based on a novel techno-economic framework. Our framework is based on two pillars. We first leverage the network programmability offered by the rapidly emerging Software Defined Network (SDN) architecture and introduce a framework to extend this programmability to the inter-domain setting. SDN is a new architecture for computer networking that offers exciting new opportunities. Presently, the data plane and the control plane of a network are vertically integrated within closed, proprietary routers and switches. SDN separates the control plane from the data plane and uses an open protocol for the communication between the control and data plane. SDN enables an external Network Operating System (NOS) [10], which interacts with packet forwarding elements, i.e., routers and switches. Control features and applications, including routing algorithms, can be deployed on top of the NOS and run as software modules. The NOS presents a consistent network-wide view to the logically centralized control logic running on top of it. This architectural evolution from closed and proprietary routers and switches to an architecture that allow external access to the innards of routers and switches through open interfaces enables to accelerate innovation as new routing schemes can be easily tested and deployed as simple as running software on top of a NOS. The NOS provides a central vantage point and a well-defined API that make it easy for the operator or a third party to create new network management and control applications. The application developer essentially operates on a local network graph or even a simpler abstraction of the network and does not need to worry about all the complexity of the distributed control of the network.
The second pillar of our framework is the unconventional idea of outsourcing the routing control logic of Internet domains to an external trusted provider, i.e., the service contractor. Outsourcing enables to optimize routing within a domain by exploiting the expertise of an external contractor. The contractor specializes in routing optimization and can relieve domains of the burden of maintaining expensive, highly-trained staff, dealing with the complex tasks of routing configuration and optimization. As multiple domains outsource their routing control logic to the same outsourcing service contractor, domain clusters are gradually formed. Since a single contractor manages routing for multiple domains, it can take advantage of this bird's eye view over its client's domains to make more intelligent routing decisions, i.e., domains which are served by the same contractor can enjoy more efficient routing. In addition, the contractor can adopt new inter-domain routing protocols between the members of its cluster. As a consequence, innovation inside the cluster can be accelerated, while legacy interfaces with the rest of the Internet can guarantee proper interoperability. In addition, each domain preserves its policy-shaping capability, privacy and business identity.
In Figure 1 (top), we show the current architecture of computer networks, which are composed of closed vertically-integrated routers and switches. In such systems, only the vendor (e.g., Cisco) can make changes in the software, OS, and hardware to introduce new features. This situation is analogous to the state of mainframe computing ecosystem in the 1970s, when a small number of companies (e.g. IBM) built the hardware, OS, and applications of closed vertically-integrated mainframes. In Figure 1 (bottom), we show the SDN approach, which decouples the control from the data plane and introduces an open interface between to enable control features to run externally on top of a NOS. This change is analogous to the introduction of open interfaces in the 1970s between the microprocessor and the OS (e.g., the x86 interface) and between the OS and the applications. Then, it opened the market to new application, OS, and hardware vendors, which accelerated innovation and revolutionized computing. SDN realizes a similar change in the architecture of computer networks, which is already bringing in major changes in how networks work.
Figure 2: Proposed inter-domain network routing paradigm
based on outsourcing the routing control logic
In Figure 2, we show the paradigm we propose. First, we outsource the routing control logic to a service contractor. This enables to centralize routing control for multiple domains. The combination of SDN and multi-domain centralization enables exciting new possibilities. One of the most interesting aspects of outsourcing is that as more Internet domains choose the same contractor, domain-clusters are gradually formed. The contractor has the bird's eye view of the domains under control. The advantages of a common contractor grow as the size of the cluster increases. The scheme scales horizontally to large clusters which are served by popular outsourcing contractors. Hosting the routing control logic of many domains technically benefits inter-domain routing in many ways, since the contractor is able to:
- Optimize inter-domain traffic engineering. Outsourcing helps to optimize the handling of network traffic beyond domain boundaries. This is because the contractor is aware of the policies, topologies, and monitoring information for all domains within the cluster. It is therefore the natural point at which routing paths can be optimized. Coordination beyond domain boundaries can yield efficient paths even if domains have different policies and optimization criteria [11]. This helps to improve routing stability and mitigate path inflation. It benefits multiple ISPs, even when they are not part of the cluster, as it will result in shorter and more stable end-to-end paths, thus also reducing network load on a larger scale.
- Resolve policy conflicts. The contractor is also the natural point at which inter-domain policy conflicts can be spotted and resolved, which helps to improve routing stability. Even in the case where the domains within a cluster are not adjacent, the global view of the contractor plays an important role, since conflicting policies may exist beyond the visibility horizon of a single domain and its neighbours.
- Enable collaborative security and troubleshooting. Exporting monitoring data to the contractor enables collaborative security and network troubleshooting through its mediation. For example, the contractor can easily pinpoint the source of a routing anomaly or failure by analyzing the information it acquires from its clients, and correlating it with other well-known sources [12] for verification and validation purposes. Such benefits can only be leveraged when aggregating information from multiple domains, including detection of prefix hijacking [13].
- Evolve inter-domain routing. The contractor is free to adopt new inter-domain routing protocols between the members of its cluster. Innovation inside the cluster can be accelerated, while legacy interfaces with the rest of the Internet can guarantee proper interoperability. Generally, we can have lower convergence times and also decreased churn through centrally controlling the dynamics of intra-cluster routing. Additionally, hierarchical routing, which is good for routing scalability, is implicitly enabled at the inter-domain level thus allowing hierarchical routing schemes to flourish.
- Simplify routing management. The logical centralization of the routing control plane helps simplify routing management. The basic concept is that we centralize the control of routing decisions associated with the cluster, both for internal and external destinations. By doing so, we can implement simpler policy-based routing since we are capable of centrally expressing, enforcing and checking routing polices. The policies can be expressed and compiled dynamically by frameworks such as [14]. Logical centralization also helps to improve routing within a cluster, through lowering the overall management complexity.
Tasks:
Our thesis is that the novel idea of combining routing control logic outsourcing with SDN principles has the potential for a break-through in the Internet routing architecture. In our project we will explore the following new capabilities (which are shown in Figure 2):
Work Package 1: Routing Control Platform. We will first build a multi-domain routing control platform that will enable network programmability at the inter-domain level and will help to accelerate inter-domain routing innovation. Clearly, the logically centralized platform should be physically distributed for scalability, resiliency, and efficiency reasons. Some key research questions we will answer are the following: How to design a scalable platform (to control dozens or even hundreds of Internet domains) that is resilient to failures, secure, and backward-compatible with legacy routers and switches? What is the effect (e.g., routing loops) of state inconsistencies among the different components of the platform and how to protect against them?
Work Package 2: Evolve Inter-domain Routing. In this work package, we will build new inter-domain routing schemes that will improve the reliability and security of BGP, while maintaining backward compatibility. Some key research questions we will answer are the following: How to best exploit centralization to design routing algorithms that converge faster and are more stable to routing changes? How to design a Denial of Service (DoS) attack mitigation architecture that fits best to our framework? How to design the inter-domain signalling protocols for DoS attack mitigation? What is the minimum routing functionality that needs to be preserved in the evolved inter-domain world to maintain backward compatibility with BGP?
Work Package 3: Mediate Tussles. A key problem of inter-domain routing is that policy conflicts between competing network domains can lead to routing instability, divergence, and inefficient paths. In this work package we will study how the mediation of the contractor can help address these issues. We will address the following key research questions: How to detect and resolve routing policy conflicts? When is a resolution feasible? How to optimize the intra- and inter-domain routing paths of network domains subject to the policy requirements of each domain? How to define fairness in this context? Is there an equilibrium point? Where do external auditors fit in the architecture to help verify that the contractor finds a fair solution?
Work Package 4: Collaborative Security and Troubleshooting. Correlating monitoring data from multiple domains is very useful for network security and troubleshooting as it enables to aggregate more data about suspicious incidents. An important problem of the Internet routing system is outages, which may be caused by events like prefix hijacking attacks, i.e., a common attack vector in which a domain advertises a prefix of another domain and hijacks its traffic. In this work package, we will research how to correlate monitoring data from multiple domains to best detect and troubleshoot outages and prefix hijacking attacks. We will address the following key research questions: To what extent can we detect incidents (and what type of incidents) without knowing the intent? How to combine data from multiple domains to determine the root cause of an incident? How to best correlate traffic measurements from multiple domains to detect prefix hijacking attacks and network outages? What is the impact of traffic sampling and incomplete measurements, e.g., due to asymmetric routing, on the detection accuracy?
Bibliography:
[1] J. Crowcroft, et al., “Plutarch: an argument for network pluralism,” SIGCOMM CCR, 2003
[2] A. Anand, et al., “XIA: an architecture for an evolvable and trustworthy internet,” HotNets, 2011
[3] T. Koponen, et al., “Architecting for innovation,” SIGCOMM CCR, 2011
[4] A. Ghodsi, et al., “Intelligent design enables architectural evolution,” HotNets, 2011
[5] P. B. Godfrey, et al., “Pathlet routing,” SIGCOMM, 2009
[6] Y. Wang, et al., “Design for configurability: rethinking interdomain routing policies from the ground up,” IEEE JSAC 2009
[7] D. Pei, et al., “BGP-RCN: improving BGP convergence through root cause notification,” Comput. Netw. ISDN Syst., 2005
[8] Kent, Lynn, and Seo, “Secure Border Gateway Protocol (S-BGP),” IEEE JSAC, 2000
[9] N. Feamster, et al., “The case for separating routing from routers,” in SIGCOMM FDNA, 2004
[10] N. Gude, et al., “NOX: towards an operating system for networks,” ACM SIGCOMM CCR, 2008
[11] R. Mahajan, et al., “Negotiation-based routing between neighboring ISPs,” in NSDI, 2005
[12] University of Oregon - Advanced Network Technology Center, "The routeviews project", http://www.routeviews.org/
[13] H. Ballani, et al., “A study of prefix hijacking and interception in the internet,” in SIGCOMM, 2007
[14] C. Monsanto, et al., “A compiler and run-time system for network programming languages,” in POPL, 2012