Interconnection Network

Interconnection Network Basic:-

▪ Topology

- Specifies way switches are wired

- Affects routing, reliability, throughput, latency, building ease

▪ Routing

- How does a message get from source to destination

- Static or adaptive

▪ Buffering and Flow Control

- What do we store within the network?

- Entire packets, parts of packets, etc?

- How do we manage and negotiate buffer space?

- How do we throttle (attack) during over subscription?

- Tightly coupled with routing strategy

▪ Network interface

- Connects endpoints (e.g. cores) to network.

- Decouples computation/communication

▪ Links

- Bundle of wires that carries a signal

▪ Switch/router

- Connects fixed number of input channels to fixed number of output channels

▪ Channel

- A single logical connection between routers/switches

▪ Node

- A network endpoint connected to a router/switch

▪ Message

- Unit of transfer for network clients (e.g. cores, memory)

▪ Packet

- Unit of transfer for network

▪ Flit - Flow control digit

- Unit of flow control within network Packet F F F F F F Flits T Head Flit Tail Flit H

3. Interconnection networks in parallel systems

3.1. Static interconnection networks

Static interconnection networks for elements of parallel systems (ex. processors, memories) are based on fixed connections that can not be modified without a physical re-designing of a system. Static interconnection networks can have many structures such as a linear structure (pipeline), a matrix, a ring, a torus, a complete connection structure, a tree, a star, a hyper-cube.

In linear and matrix structures, processors are interconnected with their neighbours in a regular structure on a plane. A torus is a matrix structure in which elements at the matrix borders are connected in the frame of the same lines and columns. In a complete connection structure, all elements (ex. processors) are directly interconnected (point-to-point), see next 3 figures.

Linear structure (pipeline) (a) and matrix structure (b) of interconnections in a parallel system.

A complete interconnection structure in a parallel system

In a tree structure, system elements are set in a hierarchical structure from the root to the leaves, see the figure below. All elements of the tree (nodes) can be processors or only leaves are processors and the rest of nodes are linking elements, which intermediate in transmissions. If from one node, 2 or more connections go to different nodes towards the leaves - we say about a binary or k-nary tree. If from one node, more than one connection goes to the neighbouring node, we speak about a fat tree. A binary tree, in which in the direction of the root, the number of connections between neughbouring nodes increases twice, provides a uniform transmission throughput between the tree levels, a feature not available in a standard tree.

Tree structures in a parallel system: a) binary tree, b) fat tree

In a hypercube structure, processors are interconnected in a network, in which connections between processors correspond to edges of a n-dimensional cube. The hypercube structure is very advantageous since it provides a low network diameter equal to the degree of the cube. The network diameter is the number of edges between the most distant nodes. . The network diameter determines the number in intermediate transfers that have to be dine to send data between the most distant nodes of a network. In this respect the hyperciubes have very good properties, especialy for a very latge number of constituent nodes. Due to this hypercubes are popular networks in existing parallel systems.

Cube

dimension

Node number

Network diameter

Structure

· processor, ¾ node connection

3.2. Dynamic interconnection networks

Dynamic interconnection networks between processors enable changing (reconfiguring) of the connection structure in a system. It can be done before or during parallel program execution. So, we can speak about static or dynamic connection reconfiguration.

3.2.1. Bus networks

A bus is the simplest type od dynamic interconnection networks. It constitutes a common data transfer path for many devices. Depending on the type of implemented transmissions we have serial busses and parallel busses. The devices connected to a bus can be processors, memories, I/O units, as shown in the figure below.

A diagram of a system based on a single bus

Only one devices connected to a bus can transmist data. Many devices can receive data. In the last case we speak about a multicast transmission. If data are meant for all devices connected to a bus we speak about a broadcast transmission. Accessing the bus must be synchronized. It is done with the use of two methods: a token method and a bus arbiter method. With the token method, a token (a special control message or signal) is circulating between the devices connected to a bus and it gives the right to transmit to the bus to a single device at a time. The bus arbiter receives data transmission requests from the devices connected to a bus. It selects one device according to a selected strategy (ex. using a system of assigned priorities) and sends an acknowledge message (signal) to one of the requesting devices that gtrants it the trqansmitting right. After the selected device completes the transmission, it informs the arbiter that can select another request. The receiver (s) address is usually given in the header of the message. Special header values are used for the broadcast and multicasts. All receivers read and decode headers. These devices that are specified in the header, read-in the data transmitted over the bus.

The throughput of the network based on a bus can be increased by the use of a multibus network shown in the figure below. In this network, processors connected to the busses can transmit data in parallel (one for each bus) and many processors can read data from many bysses at a time.

A diagram of a system based on a multibus

3.2.2. Crossbar switches

A crossbar switch is a circuit that enables many interconnections between elements of a parallel system at a time. A crossbar switch has a number of input and output data pins and a number of control pins. In response to control instructions set to its control input, the crossbar switch implements a stable connection of a determined input with a determined output. The diagrams of a typical crossbar switch are shown in the figure below.

Crossbar switch a) general scheme, b) internal structure

Control instructions can request reading the state of specified input and output pins i.e. their current connections in a crossbar switch. Crossbar switches are built with the use of multiplexer circuits, controlledby latch registers, which are set by control instructions. Crossbar switches implement direct, single non-blocking connections, but on the condition that the necessary input and output pins of the switch are free. The connections between free pins can always be implemented independently on the status of other connections. New connections can be set during data transmissions through other connections. The non-blocking connections are a big advantage of crossbar switches. Some crossbar switches enable broadcast transmissions but in a blocking manner for all other connections. The disadvantage of crossbar switches is that extending their size, in the sense of the number of input/output pins, is costly in terms of hardware. Because of that, crossbar switches are built up to the size of 100 input/output pins. The crossbar switches that contain hundreds of pins are implemented using the technique of multistage interconnection networks that is discussed in the next section of the lecture.

3.2.3. Multistage interconnection networks

Multistage connection networks are designed with the use of small elementary crossbar switches (usually they have two inputs) connected in multiple layers. The elementary crossbar switches can implement 4 types of connections: straight, crossed, upper broadcast and lower broadcast. All elementary switches are controlled simultaneously. The network like this is an alternative for crossbar switches if we have to switch a large number of connections, over 100. The extension cost for such a network is relatively low.

In such networks, there is no full freedom in implementing arbitrary connections when some connections have already been set in the switch. Because of this property, these networks belong to the cathegory of so called blocking networks.

However, if we increase the number of levels of elementary crossbar switches above the number necessary to implement connections for all pairs of inputs and outputs, it is possible to implement all requested connections at the same time but statically, before any communication is started in the switch. It can be achieved at the cost of additional redundant hardware included into the switch. The block diagram of such a network, called the Benes network, is shown in the figure below.

A multistage connection network for parallel systems

To obtain nonblocking properties of the multistage connection network, the redundancy level in the circuit should be much increased. To build a nonblocking multistage network n x n, the elementary two-input switches have to be replaced by 3 layers of switches n x m, r x r and m x n, where m ³ 2n - 1 and r is the number of elementaryswitches in the layer 1 and 3. Such a switch was designed by a French mathematician Clos and it is called the Clos network. This switch is commonly used to build large integrated crossbar switches. The block diagram of the Clos network is shown in the figure below.

A nonblocking Clos interconnection network

Interconnection Full Basis — Apr 16, 2017 8:18:01 PM

Interconnection Network Notes — Apr 16, 2017 11:10:24 AM

Google Sites

Report abuse