15.1 Parallel Processing

Files and Resources

Specification

  • show awareness of the four basic computer architectures: SISD, SIMD, MISD, MIMD

  • show awareness of the characteristics of massively parallel computers

Flynn's Taxonomy of Parallel Machines

Flynn's Taxonomy categorises machines by:

    • Number of instruction streams

    • Number of data streams

The 4 different types can be categorised as follows:

SISD

SISD.svg

Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single processor (one CPU) executes exactly one instruction stream at a time and also fetches or stores one item of data at a time to operate on data stored in a single memory unit. Most of the CPU design, based on the von Neumann architecture, from the beginning till recent times are based on the SISD. The SISD model is a typical non-pipelined architecture with the general-purpose registers, as well as dedicated special registers such as the Program Counter (PC), the Instruction Register (IR), Memory Address Registers (MAR) and Memory Data Registers (MDR).

SIMD

SIMD.svg

Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single control unit (CU) and more than one processing unit (PU) that operates like a von Neumann machine by executing a single instruction stream over PUs, The PU shown here could be an ALU, handled through the CU. The CU generates the control signals for all of the PUs and by which executes the same operation on different data streams. The SIMD architecture, in effect, is capable of achieving data level parallelism just like with vector processor.

An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory.

With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with traditional CPU design.

Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time.

MISD

MISD.svg

Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing where many functional units perform different operations by executing different instructions on the same data set. This type of architecture is common mainly in the fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors. It is not found in commercial systems but common in control systems, such as in theme parks.

MIMD

MIMD.svg

Multiple Instruction stream, Multiple Data stream (MIMD) is an Instruction Set Architecture for parallel computing that is typical of the computers with multiprocessors. Using the MIMD, each processor in a multiprocessor system can execute asynchronously different set of the instructions independently on the different set of data units. The MIMD based computer systems can used the shared memory in a memory pool or work using distributed memory across heterogeneous network computers in a distributed environment. The MIMD architectures is primarily used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modelling, communication switches etc.

Massively Parallel (Computing)

Let's break down the two parts here:

Massively: Referring to a large number of processors (of which there could be hundreds/thousands, etc.)

Parallel: Referring to the ability to perform a set of coordinated computations simultaneously

For hardware and software to process data in parallel there must be a number of things in place.

Hardware

Processors need to be able to communicate so that processed data can be transferred from one processor to another.

A modern day CPU with multiple cores cannot be considered massively parallel as there is only one (separate) processor, not many processors. Equally, each core/processing unit shares the same bus.

For hardware to run a massively parallel system, the following must be in place:

  • Communication is needed between the different processors

  • Each processor needs a link to every other processor

  • Many processors require many of these links, leading to a challenging topology

Software

Suitable program design and appropriate programming language which allows data to be processed by multiple processors simultaneously.

A significant part of the design process is managing data dependencies. No program can run more quickly than the longest chain of dependent calculations (critical path), since calculations that depend upon prior calculations in the chain must be executed in order. A similar problem exists for RISC processors in the pipe lining process.

For program code to run on a massively parallel system, the following must be in place:

    • Split the code into blocks that can be processed simultaneously, instead of sequentially

    • Each block is processed by a different processor, which allows each of the many processors to simultaneously and independently process these different blocks of code.

    • Requires both parallelism and co-ordination to send the blocks to the different processors.

BOINC is an example of a massively parallel computing system, which uses the spare processing capacity of volunteer computers around the world.

Links

Massively Parallel Processing

Parallel Computing at a Glance

Example of threading problems (not specifically linked to specification)