Overlay Topologies

ECE1548 - Advanced Network Architectures 

Overview of Overlay Topologies

The Internet is a world wide network of networks based on the IP protocol. Packets are transmitted and routed to their destination by interconnected routers that form the core of the network. The distinguishing characteristic of the Internet is that the intelligence is kept at the edge of the network. The core routers of the Internet are essentially very simple devices that are designed to do the complex task of efficiently routing packets to their destination.

The concept of an overlay network can be conceptually thought of as building a network on top of another network. Within the context of the Internet, an overlay network is a network in which nodes at the edge of the network (other computers) form some kind of topology and become routers in this topology for the network they form. Essentially, an overlay topology facilitates application layer routing and is the basis for P2P networks.

Nodes in an overlay topology maintain TCP connections with other nodes in the network. Two nodes that maintain a TCP connection to one another are referred to as neighbours. The set of all logical neighbours form the overlay topology. Since the TCP connections are logical connections which can be very easily created and destroyed, overlay topologies have the distinguishing characteristic of being very flexible and sometimes very dynamic. At the IP layer, connections are physical and the network itself is mostly static.

The purpose of an overlay topology is to provide routing at the application layer. Nodes that form an overlay topology do so with some goal in mind. For instance, LimeWire and KaZaA form a P2P overlay topology that is maintained for the purposes of searching for files. Given an arbitrary topology, a search is performed by node A by first sending out a search request to all of its neighbours. The neighbours of node A then in turn propagate the search request to all of their neighbours and so on and so forth until all nodes in the P2P network have received the search request. This process is known as flooding. Flooding is usually considered inefficient. Some overlay topologies can perform a search without flooding.

When a node receives a search request, the node first checks its local database to see if the query returns positive. If it does, a positive acknowledgement is returned directly to the node that initiated the request. There is no need to propagate the query result back through the overlay topology – this is inefficient and unnecessary, however, sometimes propagating results back to the initiating node through some of the online nodes may be advantageous. Skype for instance uses this technique in its P2P overlay in order to work around the NAT problem.

Categorization of Overlay Topologies: Structured vs. Unstructured

Overlay topologies come in many flavours each one better suited to some applications than others. A structured overlay topology maintains a distributed lattice structure. Nodes fill positions or holes within the lattice when they are added to the network. Because the positions within the lattice are predetermined, when nodes are added to a structured network, their placement is predictable and deterministic. Chord and Tapestry are two types of structured topologies. Since the lattice structure is predetermined, it is possible to assign indices to the positions in the structure which facilitates the implementation of distributed hash table (DHT) services.

An unstructured topology is one in which the nodes form an overlay in a non-structured fashion which is often random. Newscast and Montresor’s algorithm are examples of unstructured topologies. Since an unstructured topology does not need to maintain any structure, there is in general less overhead when nodes are inserted and removed from an unstructured network than there is from a structured one. Most structured topologies require some form of stabilization messaging to occur periodically to repair damaged lattices. Unstructured topologies generally do not suffer from this particular problem.

Categorization of Overlay Topologies: Flat vs. Hierarchical

In flat overlay topologies, all nodes are treated equally. A hierarchical topology exploits heterogeneity in the network by distinguishing between two types of peers; super peers and ordinary peers. Ordinary peers maintain a connection with one other super peer which is referred to as the ordinary peers’ parent node. The set of ordinary peers of a super peer are known as the super peers’ child nodes.

In P2P networks, the super peers often take on more responsibility than an ordinary peer. A super peer may, for instance, cache all of the shared files of its own child nodes. When a search is performed, the super peer can search the local cache of its own files as well as the files of its child nodes without having to forward the search packet to its child nodes.

It is important to note that the scalability of a hierarchical overlay has the same complexity as a flat overlay. This is simply due to the fact that a super peer will accept a constant number of ordinary peer child nodes. Thus the scalability has only been reduced by a constant factor in terms of complexity. Despite this fact, a hierarchical overlay is often preferred to a flat overlay because a hierarchical overlay can take advantage of heterogeneity among the devices used in the P2P network. Nodes with more bandwidth or processing power prove to be good candidates for the role of a super peer. Nodes with low processing power or with little bandwidth are better suited to being an ordinary node. In a global P2P network with heterogeneous devices, cell phones should never be promoted to the role of super peer. Desktops connected to a broadband link however, should definitely be considered for the role of super peer.

Metrics for Overlay Topology Evaluation

There are many metrics used to evaluate overlay topologies. Some of these metrics are subjective. Reliability and robustness are key topology metrics that measure how well the network adapts to node insertions and removals. Nodes are inserted into the network when a peer connects to the network. Nodes may be removed from the network in one of two ways; gracefully or ungracefully. A node is removed gracefully by contacting its neighbours and letting them know that the node is about to leave the network. The neighbours can at this point decide the next course of action to replace the node that is leaving. When a node leaves ungracefully, it simply disconnects without notifying any other node. This particular scenario is not uncommon nor may it be considered malicious. A node may fail at any time due to hardware failure or software bugs and thus an overlay topology must be able to cope with nodes that fail in this manner. The reliability and robustness metric is a subjective metric as there is no value that can be assigned to reliability.

The network diameter is a metric which measures the maximum number of hops at the overlay layer between any two peers. Given the number of nodes within the network, the network diameter can give a good idea of how long a search might take. A related metric to network diameter is node degree which is a measure of the expected number of connections, or in other words, the expected number of neighbours per peer. Network diameter and node degree are related in that a higher node degree lends to a smaller network diameter. Both network diameter and node degree are objective measures that can be measured.

Most overlay topologies exchange some kind of heartbeat or stabilization message to verify that their neighbours are still alive. In some cases, these stabilization messages may be more complex for some overlay topologies than others. Since these messages don’t play any part in achieving the goal of the overlay network, they are considered an overhead. The magnitude of the overhead depends on the number of messages that the protocol must exchange to maintain its topology.

Distributed Hash Tables

A hash table is simply a data structure that defines a set of buckets that holds objects. A hash function is a special function which takes an object and maps it to a position in the hash table. Good hash functions map objects to buckets in a statistically uniformly distributed way. Put another way, a good hash function minimizes the number of collisions over the space of objects. There exist a variety of hash functions. The current standard hash function is the SHA-1 hash function which distributes objects over a 160 bit space.

A distributed hash table is simply a hash table where nodes in the network become the buckets. In such a system, the hash table is no longer stored on a single machine. Objects stored in the hash table are files or resources that a user may want to find. In order to find the objects, the object’s ID is hashed and then modded over the space of nodes. The object should be stored at the node which matches this value. This of course implies that nodes participating in providing distributed hash table services should have IDs that correspond to positions in a hash table. Structured topologies generally provide this facility and are well suited to providing DHT services due to the fact that they define predefined positions which can be assigned hash values.

Structured Topologies: Chord

Chord is a structured P2P overlay topology that has simple, provable performance and correctness. Normal search time for Chord has complexity O(log2(N)) and a worst case search time of O(N). In chord, peers are arranged in a ring according to a node ID calculated by hashing a unique feature of the node; this can be the IP address or the MAC address or perhaps the e-mail address of the user. Objects stored in the ring are stored at the node whose Node ID = Object ID.

Each peer maintains a finger table which is used for the lookup of other peers or objects. The entries in the finger table correspond to nodes whose indices are a power of 2 away from the current node. For instance, the finger table of node 1 would have entries for nodes that are {1, 2, 4, 8, 16, …} away from node 1. If the ring were completely full, these entries would correspond to nodes {2, 3, 5, 9, 17, …}. Since the ring is not guaranteed to be full, and in most cases would not be full, the entries for node n generally correspond to the first node that succeeds n by at least 2 i-1.

In order to keep the finger table up-to-date, a periodic stabilization function runs in the background which performs maintenance on the chord ring and attempts to keep the finger tables correct at all times. Incorrect or outdated finger tables may result in less than optimal search performance. If all finger tables are incorrect, the search request will have to propagate through the entire ring one node at a time which reduces the search performance to O(N) – the worst case search time.

Structured Topologies: Tapestry

Tapestry is another structured topology which has similar performance to Chord. It is in fact a generalization of the Chord algorithm. Chord uses a one dimensional ring. Tapestry can be thought of as an n dimensional hyper-sphere on which nodes are mapped. Tapestry is used in the OceanStore distributed storage system.

Overlay routing in Tapestry is done based on Plaxton routing. This technique uses local routing maps, called neighbour maps which are essentially multidimensional finger tables. The route to a node is incrementally refined at each node along the route to the destination using longest prefix matching. A typical neighbour map is shown in figure 1.

Figure 1 – Neighbour Map of a Single Node
In this case, if this node wanted to route to 2932, it would forward to the node pointed to by entry xx32

As an example, suppose node 1234 would like to route to 4598. Since none of the digits in node 1234 matches node 4598, node 1234 would look into its neighbour map for any node that matches ***8 and forward to that node. That node would then forward to **98, who would then forward to *598 and once again to 4598.

Tapestry is a generalization of Chord. As a result it has similar search complexity to that of Chord. Search is performed in O(logB(N)). Both Tapestry and Chord require periodic maintenance functions to keep their tables up-to-date. These messages are considered overhead as they do not directly contribute to the actual goal of the topology.

Unstructured Topologies: Newscast

The Newscast algorithm is a loosely consistent gossip based random topology. It has several desirable properties. The overlay is robust: failed nodes are removed automatically and simultaneous node failures do not affect the overall topology too adversely. The node degree is constant with the number of nodes in the network.

Newscast defines a peer descriptor which is basically a node’s contact information and a timestamp. The timestamp indicates when last that node was heard from: essentially the age of the peer descriptor. Each peer maintains a fixed length list of peer descriptors sorted by timestamp referred to as the node’s partial view.

while (running) {
     wait(d time units);
     q = getRandomPeer();
     sendState(q);
     sq = receive(q);
     update(sq,q);
}
//Active Thread

while (running) {
     q = receive(*);
     sendState(q);
     update(sq,sender(sq));
}
//Passive Thread

There are two main threads in the Newscast algorithm: the passive thread and the active thread. The active thread gets a random peer from the partial view list every d time units and sends its state to that random peer. The passive thread of the random peer will receive a partial view from its neighbour and return its own partial view. The two peers will then update their partial views with the partial view received from their neighbour. The update process simply takes the two partial views and merges them in ascending order by timestamp. The bottom half of the partial view is dropped which corresponds to the oldest nodes in the partial view.

There is an interesting side effect to the Newscast protocol. If a node is no longer participating in the network, its timestamp will gradually increase on everyone’s partial view until he falls off the end of the partial view. As long as the node participates in the exchange of messages, the node will always remain on other nodes’ partial views. Insertion into the Newscast overlay is trivial. A node needs to know the contact information of only one other node in the network. By sending his partial view to one node in the network, his presence will be established.

Newscast is known as an epidemic protocol. Nodes are always joining and leaving the network. As a result, the overlay is in a constant state of flux and the topology is completely random. Insertion and removal into the Newscast overlay has complexity O(1). The search time has an average complexity of O(mthroot(N)) where N is the number of peers and m is a function of the node degree. The assumption in this case is that m<<N. The maximum search time for Newscast is unbounded, but with very low probability. The performance of Newscast depends on the parameter m which is essentially the size of the partial view. The authors of Newscast recommend a value of m=20 for the Internet. When the value is smaller, the network has a tendency to become partitioned. Regardless of the value however, there is always a finite probability that the network will become partitioned.

Unstructured Topologies: Montressor’s Algorithm

Montressor’s algorithm is designed to build a search overlay topology with a minimum number of super peers. In order to accomplish this goal, Montressor’s algorithm maintains two parallel overlay topologies. The hierarchical overlay is used for search messages. The substrate overlay runs Newscast which is used as a gossip mechanism. The Newscast substrate (gossip layer) disseminates information about the load and capabilities of different peers in the network. Based on the load and capabilities of nodes in the network, peers are promoted or demoted and are assigned neighbours and parents at the hierarchical overlay layer.

Montressor’s algorithm has the property that it minimizes the number of super peers exponentially and guarantees to eventually converge to any arbitrary percentage of the target topology given enough time. The target topology is defined as the topology which minimizes the number of super peers in the network. The Newscast algorithm runs at both the search layer and the gossip layer. The time to target topology depends on the capacity of the nodes as well as the size of the partial view. The search time has an average complexity of O(mthroot(N/c)) where m is the partial view size, N is the number of nodes in the network and c is the average capacity of the super peers. Again, the assumption is that m<<N.

Insertion of new nodes and removal of nodes changes the target topology. The time to reach the target topology is exponential. The expected time to some percentage of the target topology is highly dependent on the parameters of the system. For typical parameters of m=20, N=1,000,000 and an initially completely random placement of nodes, approximately 15 epochs are required to achieve the target topology.

Summary

Overlay topologies are used in P2P networks to provide logical routing of search queries at the application layer. Overlay topologies can be flat or hierarchical and structured or unstructured. Hierarchical topologies take advantage of the heterogeneity of nodes in the network. In terms of search complexity, hierarchical topologies provide a constant improvement over flat overlays. Structured P2P topologies provide DHT services and are fault tolerant to a certain extent. Unstructured P2P topologies generally do not provide DHT services and often require flooding at the overlay layer to find resources, but are usually more robust than structured topologies.

The choice of P2P overlay is often dependent on the application. For networks with very dynamic peers which come and go on a regular basis, an unstructured topology is preferred since it provides constant integration and removal complexity. For relatively static networks, such as server farms, a structured topology provides fast and efficient searches for objects. Such topologies work well for distributed storage.