### Introduction

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN). Kohonen´s Self Organising Feature Maps, or SOMs were invented by a man named Teuvo Kohonen, a professor of the Academy of Finland, and they provide a way of representing multidimensional data in much lower dimensional spaces - usually one or two dimensions. This process, of reducing the dimensionality of vectors, is essentially a data compression technique known as vector quantization.

The SOM is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map.

A common example used to help teach the principals behind SOMs is the mapping of colors from their three dimensional components - red, green and blue, into two dimensions. Figure 1 shows an example of a SOM trained to recognize the eight different colors shown on the right. The colors have been presented to the network as 3D vectors - one dimension for each of the color components - and the network has learnt to represent them in the 2D space you can see.

Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (a competitive process, also called vector quantization), while "mapping" automatically classifies a new input vector.

A self-organizing map consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector.

While it is typical to consider this type of network structure as related to feed-forward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation.

Useful extensions include using toroidal grids where opposite edges are connected and using large numbers of nodes. It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.

### Network Architecture

The figure shows a very small SOM network of 4 X 4 nodes connected to the input layer (shown in green) representing a two dimensional vector.

Each node has a specific topological position and contains a vector of weights of the same dimension as the input vectors. That is to say, if the training data consists of vectors, V, of n dimensions:

V1, V2, V3...Vn

Then each node will contain a corresponding weight vector W, of n dimensions:

W1, W2, W3...Wn

### Learning Algorithm Overview

A SOM does not need a target output to be specified unlike many other types of network. Instead, where the node weights match the input vector, that area of the lattice is selectively optimized to more closely resemble the data for the class the input vector is a member of. From an initial distribution of random weights, and over many iterations, the SOM eventually settles into a map of stable zones. Each zone is effectively a feature classifier, so you can think of the graphical output as a type of feature map of the input space.

Training occurs in several steps and over many iterations:

- Each node's weights are initialized.
- A vector is chosen at random from the set of training data and presented to the lattice.
- Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).
- The radius of the neighborhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighborhood.
- Each neighboring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU; the more its weights get altered.
- Repeat step 2 for N iterations.

### Calculating the Best Matching Unit

To determine the best matching unit, one method is to iterate through all the nodes and calculate the Euclidean distance between each node's weight vector and the current input vector. The node with a weight vector closest to the input vector is tagged as the BMU. The Euclidean distance is given as:

Where V is the current input vector and W is the node's weight vector.

### Determining the Best Matching Unit's Local Neighborhood

This is where things start to get more interesting! Each iteration, after the BMU has been determined, the next step is to calculate which of the other nodes are within the BMU's neighborhood. All these nodes will have their weight vectors altered in the next step.

### The BMU's neighbourhood

You can see that the neighborhood shown above is centered around the BMU (colored yellow) and encompasses most of the other nodes. The green arrow shows the radius. A unique feature of the Kohonen learning algorithm is that the area of the neighborhood shrinks over time. This is accomplished by making the radius of the neighborhood shrink over time which is done with the exponential decay function:

where the Greek letter sigma, σ0, denotes the width of the lattice at time t 0 and the Greek letter lambda, λ, denotes a time constant. t is the current time-step.

### Adjusting the Weights

Every node within the BMU's neighborhood (including the BMU) has its weight vector adjusted according to the following equation:

Where t represents the time-step and L is a small variable called the learning rate, which decreases with time. Basically, what this equation is saying, is that the new adjusted weight for the node is equal to the old weight (W), plus a fraction of the difference (L) between the old weight and the input vector (V).

The greek letter theta, Θ, to represent the amount of influence a node's distance from the BMU has on its learning. Θ(t) as it is shown in the figure above, and calculated with the equation stated below

where dist is the distance a node is from the BMU and σ, is the width of the neighborhood function as calculated by Equation 2. Θ also decays over time.