This is your reference whenever you need help on how the SOM Analyzer is used. For easy access, I have listed the topic links below via which you can jump to the section you are interested in. Do notice, that you can access this user guide directly from the SOM Analyzer by selecting Help --> User Guide.
The following help covers the functionality in SOM Analyzer v1.02 (Full version).
Help Contents
Architecture
User Interface
Layout
Colors
Menus
File
View
SOM Creation
SOM Usage
Analysis
Help
SOM Control tabs
Main
Data
Colors
Data source
Data format
Automatic vs. manual
Parameters explained
Mutable parameters
Error measures
Controlling the iterator
Result data grid
Examining, picking & creating the SOM you want
Getting Started
Application Overview
Architecture
SOM Analyzer is a Windows application written with C# and C++. The SOM kernel responsible for all CPU intensive SOM related calculations has been written completely with C++ while the UI parts have been written with the C#. The application was designed this way mainly to gain the maximum operation performance. SOM Analyzer is not a so-called MDI (Multiple Document Interface) application. Because of this, you cannot open more than one map instance at one time. If you need to process several maps at a time, you can just open another SOM Analyzer and work from there.
User Interface
The user interface for SOM Analyzer has been designed based on the feedback and experiences obtained over a period of several years. It attempts to provide the user with the most effective tools to both see the results and to manipulate the various parameters involved.
Layout
The UI has been divided into three main parts presented in the following bullets:
Menu items and icons
Along with the keyboard shortcuts (see below), these provide you with access to various SOM Analyzer features. Menu items themselves should be self-explanatory and all icons have tool tips.
SOM Control tabs
The three SOM Control tabs appear on the resizable left-hand column and with them you can do the following:
Control the BMU trajectory
Set which SOM visualizations you see
Set the labels for parameter levels
See the values of the SOM weight vectors or the BMU value depending on what you are doing
Set the colors of various items in the UI
Some hidden features that you don't see unless you know or have not activated them (tool tips)...
Keyboard shortcuts to let you directly access features
A context bound menu on your right mouse button via which you can control visualizations
Tool tips on the visualization area telling you the properties on the data in focus
Visualization area
This fills up most of the area of the visible UI. In there, you see all SOM related visualizations. It is now, however, for visualization only. In there, you can label neurons and click a any neuron you wish to studu its data more closely.
Colors
SOM Analyzer uses colors extensively in all its visualizations. Because of this, some UI elements may be hard to spot from surrounding colors. To fix this, SOM Analyzer lets you change the colors to suit your needs. All changeable UI elements can be found on the SOM control tabs --> Colors tab. Just click the color of the element you want to change and see the effect immediately!
Menus
File
As the name implies, file menu lets you access all file related operations. From there, you can create new SOM configurations, open existing SOM configurations, save open configurations and export result data to CSV files.
View
In here, you can adjust whether SOM control tabs, tool bar and status bar in UI are visible or not. You can also open the options dialog where you can set various SOM options. These are covered in detail in the final chapter.
SOM Creation
This is by far one of the two most important menus you will use. In here, you essentially create and train the SOM that you will later use and examine.
SOM Usage
This is the other important menu you will surely end up in. In here, you define what data will be fed to the SOM after it has been created. In here, you can set how the SOM processes the data and control the trajectory in case you want to have a look at that.
SOM Analysis
The SOM analysis menu currently offers only one feature, the cluster analysis, but in the future, more may come depending how the SOM Analyzer evolution goes.
Help
Via the help menu, you can open the About window showing you the SOM Analyzer version and licensing information.
SOM Control Tabs
The SOM Control tabs occupy a rectangular area on the left hand side of the screen. You can resize the tab area vertically by taking your mouse to the border of the tab area and visualization area. Just wait until your mouse pointe changes into a double arrow, hold down the left button and do the resize.
Main
This is the center of operations. In here, you see the status of the current visualizations including the fixed visualizations shown for the created SOM. These are the Umatrix, BMU Histogram, Cluster Detection and Parameter Level visualizations. Additionally, the topmost part of the tab shows you some colored buttons and a couple of numeric up and down controls to determine how the trajectory of the input data is shown. Do notice, that the trajectory can only be shown after the SOM has been created and after some data has been given to it (in SOM Usage dialog).
Data
This tab shows you what data a selected (clicked) map unit, or neuron, consists of. If set in options, the same information is shown via a tool tip when you hover your mouse over a neuron. The data on this tab, however, is more useful as it remains fixed as you examine other neurons. Whenever, the trajectory is active, this tab is split into two where the upper part shows the data of the selected map unit with the lower part showing you the data of the neuron pointed to by the trajectory tip.
Colors
As mentioned previously, the color of many UI elements can be set to your liking. Go to this tab and click the color square of the UI element you want to alter.
SOM Configuration Files
Whenever you create a new SOM configuration, you actually create two files; an XML file with the file suffix .CFG and a dedicated ASCII file having the actual SOM data having the suffix .SOM. The cfg file stores all data related to the UI and the state the UI has been at the time of the save. The som file has all properties related to the SOM, the dimensions, parameters and weight vectors. A small excertps of these files are attached below.
<?xml version="1.0" encoding="utf-8"?>
<SOMParameters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<wizardParams>
<classificationAccuracy>Detailed</classificationAccuracy>
<generalizationPower>GeneralizationPreferred</generalizationPower>
<creationSpeed>Medium</creationSpeed>
</wizardParams>
<uiParams>
<accLevel>Acceptance</accLevel>
<dataSourceDb>true</dataSourceDb>
<detectSOMParamsAuto>true</detectSOMParamsAuto>
<useSomeDbTraDataForAcc>true</useSomeDbTraDataForAcc>
<nofSomeDbTraData>1200</nofSomeDbTraData>
<dbTrainFile>c:\soman-tra-db.dat</dbTrainFile>
<fileTrainFile />
<dbAcceptFile>c:\soman-acc-db.dat</dbAcceptFile>
<fileAcceptFile />
<dbUsageFile>c:\soman-use-db.dat</dbUsageFile>
<fileUsageFile />
<detectSOMAcceptAuto>true</detectSOMAcceptAuto>
<saveFile>C:\adv-test_1.cfg</saveFile>
<saveSOMFile>adv-test_1.som</saveSOMFile>
<visColors>
<cellContour>
<R>128</R>
<G>128</G>
<B>128</B>
</cellContour>
<background>
<R>105</R>
<G>105</G>
<B>105</B>
</background>
<operatingPoint>
<R>255</R>
<G>215</G>
<B>0</B>
</operatingPoint>
<trajectory>
<R>173</R>
<G>255</G>
<B>47</B>
</trajectory>
<visualizationInfo>
<R>255</R>
<G>255</G>
<B>255</B>
</visualizationInfo>
<neuronSelector>
<R>224</R>
<G>255</G>
<B>255</B>
</neuronSelector>
...
Listing 1. The beginning of the CFG file.
$ SOM Analyzer
# A Self-Organizing Map reference vector file
# SOM IParam: xdim ydim datadim topol neigh rseed itype prep
# SOM TParam: lrate atype nsrad nerad tlen
# SOM PParam: scale offset
#$ SOM IParam: 50 36 70 hexagonal gaussian 1210236673 linear range
#$ SOM TParam: 0.020000 linear 3 1 2400
#$ SOM PParam: 9.563299E+000 1.131003E+001 4.643340E+000 4.643340E+000 6.053340E+000 6.053340E+000 1.565330E+001 2.407730E+002 2.006003E+002 2.006003E+002 2.761367E+003 1.770823E+003 1.770823E+003 1.923340E+000 1.933340E-001 9.666700E-002 2.923330E+000 4.966670E-001 4.966670E-001 9.861664E+001 1.541263E+002 3.590330E+002 1.006200E+003 1.793197E+002 1.590370E+002 1.590370E+002 1.026437E+003 1.406467E+002 1.406467E+002 8.489997E+001 8.489997E+001 1.451203E+002 3.147270E+002 1.577677E+003 1.323216E+005 1.280330E+001 1.298599E+005 9.266670E-001 5.793330E+000 5.440000E+000 2.372667E+001 1.590003E+000 2.256667E+000 1.766670E+000 3.298667E+001 2.512000E+001 3.552000E+001 1.420000E+000 9.768403E+002 9.329663E+001 9.701196E+002 1.272067E+002 2.951703E+002 2.744670E+002 4.516670E+000 8.082670E+001 8.082670E+001 2.815400E+002 1.131570E+002 2.744670E+002 4.152670E+001 9.860263E+002 1.269330E+002 1.269063E+002 1.189043E+003 2.818500E+002 2.059330E+001 1.531130E+002 9.767437E+002 3.399000E+001 2.480000E+000 1.946670E+000 1.413330E+000 1.413330E+000 1.413330E+000 1.413330E+000 3.130670E+001 1.368000E+001 1.546670E+000 1.546670E+000 3.359130E+002 3.232870E+002 3.232870E+002 2.093330E+000 2.333300E-002 1.200000E-001 2.876670E+000 2.100000E-001 2.100000E-001 8.066670E-001 8.066670E-001 0.000000E+000 0.000000E+000 2.533330E-001 0.000000E+000 0.000000E+000 2.533330E-001 2.733330E-001 2.733330E-001 2.733330E-001 2.733330E-001 1.566670E-001 2.410000E+000 2.743330E+000 3.566670E-001 0.000000E+000 9.000000E-002 1.933330E-001 0.000000E+000 0.000000E+000 5.333330E-001 1.766670E-001 2.433330E-001 0.000000E+000 1.333300E-002 0.000000E+000 8.000000E-002 5.800000E-001 2.366670E-001 1.366670E-001 1.333330E-001 1.333330E-001 2.666670E-001 0.000000E+000 0.000000E+000 0.000000E+000 0.000000E+000 0.000000E+000 0.000000E+000 0.000000E+000 0.000000E+000 2.366670E-001 4.000000E-001 1.366670E-001 3.066670E-001 0.000000E+000 0.000000E+000 1.200000E-001 1.333330E-001 0.000000E+000
#$ Status: Trained
#$ Parameter Level Labels: ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
70 hexa 50 36 gaussian
8.946074E-002 6.088869E-002 7.597813E-002 7.597813E-002 5.826856E-002 5.826856E-002 9.968425E-001 2.682964E-001 2.387063E-001 2.387063E-001 2.136444E-001 3.126758E-001 3.126758E-001 8.289855E-001 9.840127E-001 9.803908E-001 7.694626E-001 4.925469E-001 4.925469E-001 3.483861E-001 2.550060E-001 2.947390E-002 2.025940E-002 1.556670E-001 8.892199E-002 1.492347E-001 2.337940E-002 2.448795E-001 2.448795E-001 3.595959E-001 3.595959E-001 8.195572E-003 2.577591E-002 9.808395E-003 5.552342E-003 5.935891E-002 5.548141E-003 2.528979E-001 1.023355E-001 6.226131E-002 2.539646E-001 9.202223E-001 8.957527E-001 8.918464E-002 1.402611E-001 9.836517E-002 1.891724E-001 6.387599E-001 2.527743E-002 1.970637E-001 6.302705E-002 2.461307E-001 7.039556E-002 6.247449E-002 3.434690E-001 1.802308E-001 1.802307E-001 1.039948E-001 2.293172E-001 6.247453E-002 1.389289E-001 3.724878E-002 1.942375E-001 2.268832E-001 2.475487E-002 8.806977E-002 7.785828E-002 1.886791E-001 3.474658E-002 1.308770E-001
8.874840E-002 6.057967E-002 7.597743E-002 7.597743E-002 5.827261E-002 5.827261E-002 9.969072E-001 2.666599E-001 2.378728E-001 2.378728E-001 2.105764E-001 3.085042E-001 3.085042E-001 8.279138E-001 9.843674E-001 9.802182E-001 7.669169E-001 4.852802E-001 4.852802E-001 3.483969E-001 2.536120E-001 2.769871E-002 1.934764E-002 1.522362E-001 8.487060E-002 1.456968E-001 2.245084E-002 2.433977E-001 2.433977E-001 3.599595E-001 3.599596E-001 8.902871E-003 2.656238E-002 1.053904E-002 6.263761E-003 5.990952E-002 6.259556E-003 2.704533E-001 1.107541E-001 6.724326E-002 2.610597E-001 9.215189E-001 8.970529E-001 9.362853E-002 1.396509E-001 9.869274E-002 1.885808E-001 6.287024E-001 2.503558E-002 1.950685E-001 6.336190E-002 2.469562E-001 6.938405E-002 6.246936E-002 3.438232E-001 1.777669E-001 1.777668E-001 1.043665E-001 2.301880E-001 6.246940E-002 1.397259E-001 3.768362E-002 1.973969E-001 2.235693E-001 2.438106E-002 9.017634E-002 7.879906E-002 1.889264E-001 3.464961E-002 1.343348E-001
8.781236E-002 6.034773E-002 7.595564E-002 7.595564E-002 5.825792E-002 5.825792E-002 9.969374E-001 2.654299E-001 2.379452E-001 2.379452E-001 2.051470E-001 3.012161E-001 3.012161E-001 8.250353E-001 9.837506E-001 9.792165E-001 7.612464E-001 4.708396E-001 4.708396E-001 3.508302E-001 2.542806E-001 2.594609E-002 1.853556E-002 1.494291E-001 8.098763E-002 1.427865E-001 2.163558E-002 2.438820E-001 2.438820E-001 3.628513E-001 3.628514E-001 1.001069E-002 2.753271E-002 1.155248E-002 7.254579E-003 6.117407E-002 7.250381E-003 3.011618E-001 1.237385E-001 7.493713E-002 2.718683E-001 9.214874E-001 8.971040E-001 1.007864E-001 1.391087E-001 9.949404E-002 1.879512E-001 6.125302E-001 2.472538E-002 1.913847E-001 6.411844E-002 2.487071E-001 6.855701E-002 6.203529E-002 3.447026E-001 1.733783E-001 1.733782E-001 1.051787E-001 2.318825E-001 6.203531E-002 1.412988E-001 3.840772E-002 2.020843E-001 2.202652E-001 2.419703E-002 9.309961E-002 8.027826E-002 1.892900E-001 3.471315E-002 1.391663E-001
8.716985E-002 6.045512E-002 7.592310E-002 7.592310E-002 5.823427E-002 5.823427E-002 9.970140E-001 2.651640E-001 2.387541E-001 2.387541E-001 1.987273E-001 2.924067E-001 2.924067E-001 8.213164E-001 9.828972E-001 9.780118E-001 7.539834E-001 4.521876E-001 4.521876E-001 3.542208E-001 2.558414E-001 2.409138E-002 1.768533E-002 1.460972E-001 7.684734E-002 1.393308E-001 2.077653E-002 2.452430E-001 2.452430E-001 3.667834E-001 3.667834E-001 1.146664E-002 2.867746E-002 1.282578E-002 8.488438E-003 6.298213E-002 8.484269E-003 3.397657E-001 1.401605E-001 8.464011E-002 2.855079E-001 9.212542E-001 8.969598E-001 1.096335E-001 1.386222E-001 1.006222E-001 1.873519E-001 5.934743E-001 2.438100E-002 1.871535E-001 6.511976E-002 2.508273E-001 6.784860E-002 6.143609E-002 3.457592E-001 1.682801E-001 1.682800E-001 1.061271E-001 2.339045E-001 6.143611E-002 1.431842E-001 3.935006E-002 2.078551E-001 2.164276E-001 2.403173E-002 9.669701E-002 8.215100E-002 1.901328E-001 3.491309E-002 1.450991E-001
8.710138E-002 6.103769E-002 7.588796E-002 7.588796E-002 5.820847E-002 5.820847E-002 9.971952E-001 2.663029E-001 2.401769E-001 2.401769E-001 1.923463E-001 2.835158E-001 2.835158E-001 8.177727E-001 9.821773E-001 9.769209E-001 7.470841E-001 4.343548E-001 4.343548E-001 3.570794E-001 2.575591E-001 2.219109E-002 1.675901E-002 1.417812E-001 7.252485E-002 1.348104E-001 1.983291E-002 2.467866E-001 2.467866E-001 3.701485E-001 3.701485E-001 1.330900E-002 3.005802E-002 1.439730E-002 9.992303E-003 6.538862E-002 9.988172E-003 3.815492E-001 1.588540E-001 9.566377E-002 3.010309E-001 9.213030E-001 8.970719E-001 1.197450E-001 1.381018E-001 1.018718E-001 1.867286E-001 5.738559E-001 2.403502E-002 1.832850E-001 6.618902E-002 2.527623E-001 6.724662E-002 6.087636E-002 3.466465E-001 1.634439E-001 1.634438E-001 1.069093E-001 2.357546E-001 6.087639E-002 1.449160E-001 4.045815E-002 2.143139E-001 2.122371E-001 2.381756E-002 1.008694E-001 8.431129E-002 1.914752E-001 3.519040E-002 1.518822E-001
8.774488E-002 6.204467E-002 7.584922E-002 7.584922E-002 5.817973E-002 5.817973E-002 9.974579E-001 2.693449E-001 2.426520E-001 2.426520E-001 1.863833E-001 2.752866E-001 2.752866E-001 8.144957E-001 9.812380E-001 9.754574E-001 7.415028E-001 4.215638E-001 4.215638E-001 3.588056E-001 2.592804E-001 2.043676E-002 1.581732E-002 1.369598E-001 6.851939E-002 1.296245E-001 1.888436E-002 2.483479E-001 2.483480E-001 3.722717E-001 3.722718E-001 1.572104E-002 3.186347E-002 1.642354E-002 1.190360E-002 6.874478E-002 1.189952E-002 4.242136E-001 1.791750E-001 1.076883E-001 3.178803E-001 9.215306E-001 8.973412E-001 1.314175E-001 1.375050E-001 1.030758E-001 1.860208E-001 5.543592E-001 2.370217E-002 1.801257E-001 6.724131E-002 2.542228E-001 6.654338E-002 6.042479E-002 3.471850E-001 1.592363E-001 1.592362E-001 1.073694E-001 2.371672E-001 6.042482E-002 1.462480E-001 4.171654E-002 2.213294E-001 2.081333E-001 2.358023E-002 1.055982E-001 8.672972E-002 1.932311E-001 3.551275E-002 1.594405E-001
8.901269E-002 6.312660E-002 7.579478E-002 7.579478E-002 5.813821E-002 5.813821E-002 9.977400E-001 2.742220E-001 2.464843E-001 2.464843E-001 1.808665E-001 2.679887E-001 2.679887E-001 8.104147E-001 9.791560E-001 9.724187E-001 7.366213E-001 4.152953E-001 4.152953E-001 3.594449E-001 2.610435E-001 1.896413E-002 1.493455E-002 1.323639E-001 6.521183E-002 1.244742E-001 1.802477E-002 2.499357E-001 2.499357E-001 3.732240E-001 3.732241E-001 1.894553E-002 3.435334E-002 1.912557E-002 1.442769E-002 7.349980E-002 1.442366E-002 4.670537E-001 2.010750E-001 1.207740E-001 3.359781E-001 9.216970E-001 8.975428E-001 1.450965E-001 1.368478E-001 1.041240E-001 1.852015E-001 5.347021E-001 2.338257E-002 1.776567E-001 6.827398E-002 2.551615E-001 6.504638E-002 6.006264E-002 3.473434E-001 1.556236E-001 1.556234E-001 1.074849E-001 2.380941E-001 6.006267E-002 1.471385E-001 4.312031E-002 2.289851E-001 2.043255E-001 2.336317E-002 1.108699E-001 8.939835E-002 1.953350E-001 3.587312E-002 1.678353E-001
9.056596E-002 6.383121E-002 7.570803E-002 7.570803E-002 5.807145E-002 5.807145E-002 9.980025E-001 2.795672E-001 2.510207E-001 2.510207E-001 1.759465E-001 2.618338E-001 2.618337E-001 8.041683E-001 9.750916E-001 9.666725E-001 7.310744E-001 4.145718E-001 4.145718E-001 3.592792E-001 2.628449E-001 1.775682E-002 1.414692E-002 1.283125E-001 6.253863E-002 1.197701E-001 1.728478E-002 2.515241E-001 2.515242E-001 3.733729E-001 3.733730E-001 2.317471E-002 3.775265E-002 2.270119E-002 1.778725E-002 8.000362E-002 1.778324E-002 5.097571E-001 2.250119E-001 1.352336E-001 3.557020E-001 9.217591E-001 8.976426E-001 1.607069E-001 1.361552E-001 1.049599E-001 1.842733E-001 5.146050E-001 2.307234E-002 1.757686E-001 6.931182E-002 2.556293E-001 6.223445E-002 5.974487E-002 3.471509E-001 1.524567E-001 1.524566E-001 1.072870E-001 2.385743E-001 5.974490E-002 1.476279E-001 4.466014E-002 2.374070E-001 2.007481E-001 2.317786E-002 1.166422E-001 9.229819E-002 1.977421E-001 3.627254E-002 1.771390E-001
9.192867E-002 6.398869E-002 7.557999E-002 7.557999E-002 5.797211E-002 5.797211E-002 9.982513E-001 2.834440E-001 2.552554E-001 2.552554E-001 1.719652E-001 2.570861E-001 2.570861E-001 7.951863E-001 9.689307E-001 9.580240E-001 7.238812E-001 4.169489E-001 4.169489E-001 3.585331E-001 2.646715E-001 1.672862E-002 1.344398E-002 1.246023E-001 6.017750E-002 1.155038E-001 1.661987E-002 2.530755E-001 2.530755E-001 3.729949E-001 3.729949E-001 2.850038E-002 4.219241E-002 2.727063E-002 2.218490E-002 8.843900E-002 2.218092E-002 5.516834E-001 2.517166E-001 1.514502E-001 3.777151E-001 9.219841E-001 8.979003E-001 1.772952E-001 1.354444E-001 1.055691E-001 1.832792E-001 4.941387E-001 2.277004E-002 1.744011E-001 7.037336E-002 2.556892E-001 5.864276E-002 5.942549E-002 3.466407E-001 1.496097E-001 1.496097E-001 1.068135E-001 2.386593E-001 5.942551E-002 1.477690E-001 4.632470E-002 2.466654E-001 1.970263E-001 2.297991E-002 1.228636E-001 9.540362E-002 2.004318E-001 3.671364E-002 1.873732E-001
...
Listing 2. The beginning of the corresponding SOM file. Note that although wrapped here, each line holds the values of a single weight vector (50 values in this example).
Wizard Types
The wizard in the SOM Analyzer plays a fundamental role in introducing the world of SOMs to the beginners. Depending on how knowledgeable you are on SOMs, you can set the wizard to either fully automatic or semi automatic. If you feel you want to set everything manually, you can set the wizard to manual which actually disables it letting you have full control over all parameters.
The purpose of the wizard is really to hide the details of SOM generation from the user. The fully automatic wizard only asks you a minimum number of questions before firing up the som creation process. Semi automatic wizard will ask more to find out what your intentions are. In manual drive, you will have to set all parameters yourself.
SOM Creation Process
Selecting the Data
Before a SOM can be created, some data will have to selected for the SOM to process.
Data source
SOM Analyzer can read data from either a file or a MySQL database. File support is built-in while for the database connectivity, you will have download and install a MySQL connector. See the Support section for more information on how to proceed with that.
If you are using a database as a datasource, you can choose to divide the dataset into two parts of which the first part goes to teaching the SOM and the second to testing it. If you are using files, you must typically provide two files, one for learning and the other for testing the map. If you do not have more than one file, you can, of course, use the same file for testing to get started.
Data format
If you choose to use files, all you have to do is make sure the data in the ASCII file is in the right format. The format is presented below in listing 3.
5
1.123164E+000 6.486063E+001 -1.333097E-001 -6.203650E+000 2.939411E+002
4.097987E+001 1.999333E+001 -1.010982E+001 -1.081089E+001 2.203780E+002
1.848504E+001 4.404437E+001 4.314666E+000 8.626218E+000 1.125747E+002
3.977814E+001 6.018012E+001 1.580633E+000 1.129447E+000 1.327666E+002
4.160415E+001 2.916935E+001 -6.416970E+000 -6.481373E+000 2.693867E+002
...
Listing 3. Format of a SOM Analyzer data file.
The first line tells SOM Analyzer the dimension of the data. The following lines contain the actual data. If a data item for some reason does not exist, a letter 'x' can be placed at the position and SOM Analyzer will simply skip that component.
SOM Parameters
Automatic vs. manual
Depending on the type of wizard you have decided to use in View --> Options, you either have or don't have access to the SOM Parameters. The manual wizard mode is the only one that lets you see and modify the SOM parameters. Even then, you can resort to using the auto detection in case you don't want to set the parameters yourself.
Parameters explained
Eventually, there will be a day when you want to see what is beneath the surface and what you then find are the following parameters:
SOM Initialization
These parameters define the most rudimentary map characteristics that must be in place even prior to start learning the map.
Map Dimensions *
These define the x and y dimensions of the produced map.
Map Topology
This defines the topology of the produced map. Currently, only the more versatile and useful hexagonal topology is supported and therefore this field is disabled. It is only there to show you the topology in use and also _IF_ there ever will be some new topologies added.
Neighbourhood Function *
This can be either Gaussian or Bubble and it defines the way the BMU impact effect on neuron attenuates as the distance from it increases.
Initialization Method
The weight vectors of the SOM must have some start values when the learning starts. This parameter defines how the initial values are set. There are two possible values, Linear and Random. In linear initialization, the weight vector components are set based on the given data set. In random initialization, the values for the weight vectors are randomized.
Random Seed *
The random seed can be set to a specific value, if you need to be sure that the randomization follows a certain pattern (i.e. the random number generator starts from the same seed). This way, you can make sure that the successive maps you create can be more easily compared as the initial conditions resemble each other. However, if you do not need this type of behavior, you can just leave the value at zero to get the maximum amount of diversity. In this case, the seed is taken from the system clock.
Preprocessing Method *
To maximize the adaptive power of the SOM the data can be preprocessed before it is fed to the SOM. This parameter can have one of the three possible values: None, Variance and Range. While none does no processing, the other two scale the data either by variance or range. It is difficult to say beforehand which of preprocessing values is best as it depends on the type of data you are processing.
Training parameters
These parameters define the SOM behavior when it is processing the learning data set. Setting these parameters the right way will have a major impact on the quality of the map.
The right part of the Training Parameters group box has a an area that shows you the individual training sets you have configured. When SOM is trained, all of these will be run. If you want the SOM to first roughly adapt to the data, you can define a short, but more aggressive training set. Next, you can define another, more fine tuned set that nicely finalizes the SOM. You can define any number of learning sets and modify them at will. To modify a learning set, just click on the row you want and observe how the training set fields change to reflect the selected set. Change any values to your choosing and click Modify and see how the values on the line are updated.
Learning rate
This parameter defines you aggressively the SOM learns. At first, you might think that obviously the greater value this parameter has the better. However, larger values will cause fluctuation and loss of generality of the final map. It may learn a detailed data instance very well, but may be unable to classify other similar instances.
Learning function
This can be either Linear or 1/t. To kind of focus the effects of learning, this function defines how the learning rate decreases upon learning. The linear function gradually makes the learning rate drop to zero. The 1/t on the other hand, make the drop even more dramatic but will leave more room for fine tuning at the end, so to speak.
Neighbourhood radius
These two parameters define the start and end radii of the learning. They define the area of impact on the map each BMU hit in learning has around it. When the learning proceeds, this radius gradually decreases until at the very end it is the same as the end value.
Training length
Training length defines the number of data vectors given to SOM in the learning phase. The number is typically given in epochs where each epoch represents the data vector count in a single batch. For example, if you have a data file consisting of 5000 vectors then your epoch is 5000 vectors. If you give a value 3 for the training length the SOM will trained with 15 000 vectors and the data batch is processed three times.
Note! In case you need to specify a precise number of data vectors you can actually override the epoch definition by prepending the training length number with '#'. With this, the number is interpreted as is. For example, if your data batch has 5000 vectors as before and you want to use exactly 11 000 vectors, you can just enter #11000 in the field.
The little asterisk (*) following each parameter above and also in the SOM Analyzer user interface indicates a mutable parameter. This is an important concept that you should understand. Mutable parameters are explained the next section.
Do notice that all SOM specifics and theory are explained in detail in the About SOM section.
Mutable parameters
The previous section already mentioned the mutable parameters as they are the ones marked with the asterisk (*) in the Training Parameters group box.
In short, the mutable parameters are something that get changed with each iterator run. Well, not all of them get changed but rather, the changes take place in a systematic fashion. In the SOM Acceptance tab (see next section), you can define which of the mutable parameters you wish to take into use. When enabled, the iterator will change mutable parameters so that all combinations will get run.
SOM Acceptance
In this tab, you find two group boxes: the Acceptance Criteria and the SOM Iterator Control. Via the acceptance criteria you can set limits that dictate if the SOM resulting from the training process meets your criteria. In SOM iterator contro you control how the iterator is run.
Error measures
In the acceptance criteria, you actually define the limits for the two built-in error measures that are run for the SOM right after the learning process is over. The two supported error measures are the following:
QE = Quantization Error
TE = Topographic Error
If the values obtained from these are below the limits you have set, the SOM is accepted and the Accepted column in the iterator results data grid (see section SOM Iterator below) is changed to reflect that.
Do notice that all SOM specifics including the theory behind error measures are explained in detail in the About SOM section.
Iterator control
Here, you have full control over the the powerful SOM iterator that is a unique SOM Analyzer feature. By using the iterator, you can create hundreds or event thousans of SOMs against your data and let the computer do this all for you. Just set the iterator parameters, fire it off and go home for the day. When you come back, you will see a comprehensive result set showing you the results of all iteratrions. You also see how long it took for the SOM Analyzer to run the iterator.
The iterator control parameters you can alter are the following:
Number of iterations to execute
This field sets the number of steps that the iterator performs.
Random seed
Mutate this parameter
Neighbourhood function
Mutate this parameter by trying out all supported values (Bubble / Gaussian)
Preprocessing method
Mutate this parameter by trying out all supported values
If you check Skip "None" only the values resulting in actual preprocessing (Variance / Range) will be tried
Map size
In here you set the end dimensions for the map
The start map dimensions are defined in the SOM Training Parameters
The number of iterations determines how fast the map dimensions reach the configured end dimensions
Note! By defining the 'aspect ratio' of the map differently compared to what you have in the start dimensions you can observe how map proportions affect the obtained results.
SOM Iterator
SOM Iterator lets you move the tedious SOM iteration tasks from your shoulder to your computer. SOM Iterator runs the iterations against the data you have chosen and lets you examine in real-time the results obtained from the iteration process. If, during the iteration process, you see that the SOM you want has already been found you can stop the iterator at any time and start using the results already collected.
Result data grid
The results collected by the iterator are shown on the data grid that occupies the majority of the SOM Iterator tab, in the Results group box. In it, you see all map properties an iteration has had. The data grid also lets you sort the rows into ascending or descending order by the column of your choosing.
The first six columns of the data grid show you iteration related information while the remaining four columns show you the values of the mutable parameters. The first colums are as follows:
Iter #
This is the iteration step taken. In case you have a run the iterator with 100 steps, you will have values 1 though 100 in this column.
Accepted
The value in this column can be either - or YES. The value YES incidates that this iteration satisfies the acceptance criteria set in the SOM Acceptance tab.
Train QE
Quantization error measure run for the SOM against the data used in training.
Train TE
Topographic error measure run for the SOM against the data used in training.
Accept QE
Quantization error measure run for the SOM against the data used in testing.
Accept TE
Topographic error measure run for the SOM against the data used in training.
Examining, picking & creating the SOM you want
It may sometimes be hard to pick the best SOM out of the batch. The error measures, of course, help you to decide what is best but even they won't tell you the whole story. Trying out different SOMs is very easy and I have found that by following the steps below will eventually give you the best results.
First, take some time to examine the results collected by the iterator.
Sort by various measures and examine if certain iterations come close to the top (assuming best score is on top)
To help you further, go back to the SOM Acceptance tab and manually set the error measure limits to narrow down the result set.
When done, click the Re-evaluate Iterator Results button to update the Accepted column in the results data grid to reflect the changes you just made.
When you have a hunch about the SOMs that might suit your needs, it is time to start checking if they really are what you are looking for.
Click the desired row on the results data grid so that a little arrow pointig right appears on the left. To clearly highlight the entire row, click on the leftmost, blank column.
Either way, you have now selected the row from which you want the SOM to be created from.
Click the Create SOM button. This will initiate the SOM Creation process.
When done, just click the green SOM Finished button and you will be taken to see the map you just created.
The iterator results remain in memory for as long as you run the iterator again. They will even be saved when you save the map. Therefore, you can come back to the results at any time and create another SOM with just a few clicks.
SOM Usage
Visualizations
As you probably know already, SOM is a highly visual view to the processed data. Therefore, the moment you have created your first SOM, you are greeted with greenish lattice of hexagons. This visualization, showing you the neurons in hexaxons, is called UMatrix. It is intended for visualizing the neuron distances in a multi-dimensional space with just one glimpse. There are many types of visualizations with the SOM depending on what is visualized. The visualizations in the SOM Analyzer are listed below.
UMatrix
Visualizes the neuron distances from one another with value 0 having the light green color indicating zero distance and value 1 having the black color indicating large distance.
BMU Histogram
Visualizes the BMU hits on the map collected during the SOM Usage phase. This is a statistical visualization showing you the distribution of data over the neurons.
Parameter Levels
Depending the dimension of the map, the number of parameter level visualizations varies. For a map having a dimension of 5, there will also be an equal number of parameter level visualizations. Parameter level visualization shows the value for the particular data component of the neuron weight vector.
Cluster Detection
This visualization appears on the left of selectable visualization once you have run the cluster detection analysis for the map. It shows you the cluster boundaries with distinct colors. It also lets you label the clusters.
All visualizations can be toggled on and off via the SOM Control tab on the left.
Do notice that all SOM specifics including the visualizations are explained in detail in the About SOM section.
Data selection
When you have the trained SOM, it is time to start using it. You probably already some data available that you can feed to the SOM. If not, you can start by giving it the data you used in training just to see how it maps to the SOM.
Click on the SOM Usage in the menu and then Data Selection. This opens up a dialog via which you can select the data the SOM starts processing. There are two types of data processing options you can choose from, the online and the offline data.
Online Analysis / Monitoring
If you select this, you can fetch data real-time from the database (note! files are not supported)
The data fetch is based on a query where a timestamp like column indicates the data freshness.
Sampling interval specifies how often the database query is run and consequently, how often the trajectory is drawn.
Offline Analysis
If you select offline data analysis, the SOM will first load the entire data set and will then start giving to the SOM
Sampling interval specifies the interval at which the data lines are fetched to the SOM and consequently, how often the trajectory is drawn.
The option Process all offline data after loading will result in a loop where the data lines are fetched to the SOM at maximum speed.
When all data has been processed, you can go back to any position in the trajectory and study the behaviour in more detail.
Actions taken before usage starts
In here you have the option of clearing the BMU histogram in case you need to start fresh with each data set. The values of the BMU histogram can be seen with the BMU Histogram visualization (see above). If unchecked, the values of the BMU histogram are not reset but accumulate to contain the values of all data sets you process.
BMU trajectory
The Best-Matching Unit (BMU) trajectory is an integral part of the SOM Analyzer and a powerful tool for you to get detailed insights into your data. Whenever you process a data set with SOM via the SOM Usage menu, the trajectory will memorize all values the input data vectors have had on the SOM. The trajectory consists of the so-called operating point indicating where the current BMU hit is and the tail showing you the trail of the past operating points, i.e. the BMUs. The trajectory offers you the following tools to explore your data.
Animated state transition
You can examine how you data changes over time as the BMUs wander to different parts of the map
As map areas, or clusters, have different semantics, you can observe how the state of the observed process changes over time.
By setting the trajectory length, you can have a view to the past. For example, if you have a trajectory of length 20 visualized and your sampling interval is 1 minute you can observe the process at the moment (indicated by the operating point) and the process states 20 minutes back (indicated by the trajectory tail).
Observe the data of the input data vectors and the map at the same time
By switching to the Data tab in the SOM control tab, you can observe in just one view how both the current data vector and the SOM weight vector under the BMU relate to one another.
Go forward or back or jump to whatever position you wish
You are not limited in any way as to what part of the data you want to examine. You can control the trajectory with the UI buttons or for example, if you are in the Data tab where the UI buttons are not visible, with simple keyboard shortcuts.
Finding a BMU
Using the trajectory is the way to find BMUs for a set of data vectors as each data vector maps exactly to one BMU. However, if you need to find a BMU for a specific data vector, the Find BMU features comes to rescue. With it, you can input any data vector or even an incomplete vector and map it to the SOM by clicking the Execute button. This way, it is very easy to experiment various values and see how it affects the result.
To see the values of the actual BMU weight vector next to what you have keyed in, just enlarge the Find BMU window horizontally and you will see the values in the rightmost column.
Various Topics
Clustering the map
Clusters are what the SOMs are so good at. Clusters indicate data concentrations that indicate similarities in data and therefore are worth the study. SOM is capable of clustering any data and therefore the Cluster Detection analysis has been built into the SOM Analyzer so that working with clusters could be as easy as possible.
Cluster detection can be initiated from under the Analysis menu. This opens up a dialog where you can set various items and cluster labels. When experimenting with the parameters, click Execute to see the interim results. When you are happy with the results, just click the Finished button and the Cluster Detection will be added to the list of available visualizations in the SOM control tab. Via the visualization context menu (click your right mouse button over the visualization item or the visualization area), you can delete the cluster detection visualization from the visualizations list or you can directly access the labelling mechanism without first opening the cluster detection dialog.
You can return back to the cluster detection window any time and continue where you left off.
Note! Cluster labels are not like neuron labels. Neuron labels will always fixed to the neurons they have been attached to. Cluster labels will move along with the cluster whenever you tune the parameters.
Labelling the map
When you have identified some key properties the map exhibits, you can label the map so that it is easier for you and others to undestand the semantics of neurons. Working with map labels goes as follows.
To click a particular neuron, double click on it
This opens a small dialog requesting you to enter the label
To re-edit the label, repeat the steps above
To remove the label from a single neuron, double click the neuron and clear the label field.
If you want to remove all labels from the map, click SOM Usage --> Clear --> Neuron Labels
Labelling the parameter levels
When you have created a SOM, it will be of great help to you and others if the parameter levels are labelled. Unlike neuron labelling, this should be a more trivial task as you probably have an idea what the data you have used to train the SOM with represents. Working with the parameter level labels goes as follows.
When still creating the map and when reading data from the database, you can click on the Import column names to parameter levels in the data selection dialog (bottom right corner of it)
Otherwise, parameter level labels will be set only after the SOM has been created.
You can either right click on the parameter level label visualization and select Edit Parameter Level Label
Or you can right click on the visualization to do the same thing.
To remove all parameter level labels, click SOM Usage --> Clear --> Parameter Level Labels
Exporting result data to CSV file
When you are through with all the cluster, neuron and parameter level labelling you have probably processed some interesting data as well. If you feel you need to have another view at the data e.g. with a spreadsheet application, you first need to make your result data available to that other program. To accomplish that, click the File --> Export Result Data To CSV File. A dialog opens having the following two areas of interest.
Status area shows you a brief help to the feature as well as information on how many result vectors can be exported.
The data export area lets you include a header line that is of the following form:
Parameter level labels for each component
BMU found for the particular vector
BMU label
Label of the cluster within which the BMU is
You can pick one of the three possible field delimeters that will be used when writing the CSV file. The possible delimeting characters are:
; Semicolon
: Colon
TAB Tabulator
To start the actual exporting process, click the Select File & Export button. After selecting the file and clicking Save the file is written. The green text <file> created! in the status area indicates that the file has been created successfully.
Data localization
This concerns you only if you are using database as data source for your SOMs.
There are many optimizations the SOM Analyzer makes to make operations around SOM as efficient as possible. One of them is data localization from database. It is important that you understand what it means mainly because of the reasons that follow.
In short, data localization first reads the required data from the database once and writes it all to files on your disk. All operations that follow requesting the same information will actually be directed to read the data from the files.
Data localization is performed mainly because the iterator needs to have repeated access to the same data. Reading it from the database over and over again not only takes time but also negatively affects the performance of your internet connection and consequently, all internet relates tasks.
Data localization takes up disk space on your computer up the amount you have selected. Normally, this should not be a problem. However, if you know your disk space is limited, you should perhaps take this point into account.
Once you know about localization. You can actually save the localized data files for later use. This lets you work with SOMs even if you do not have the internet or database available.
This, of course, does not apply if you have a local database configured on your computer that you are using.
If you want to save the localized files for later, this is how you do it.
Once you have got the data from the database, save the SOM configuration.
Open the SOM configuration (file with the cfg suffix) with any text editor.
Look for the following tags at the very beginning of the file. They are easy to find.
<dbTrainFile> Localized training file is here </dbTrainFile>
<dbAcceptFile> Localized acceptance file is here </dbAcceptFile>
<dbUsageFile> Localized usage file is here </dbUsageFile>
Once you decide which files you need, immediately copy them to safety as these files are in a temporary directory and can be purged by the operating system at any time.
Data localization initially delays the SOM operation but when complete, will greatly speed things up as the data will be read from the local disk and what is even better, from cache.
Options
You can access the options dialog by clicking View --> Options in the menu. There you find quite a few options that you can tweak to adjust the SOM Analyzer to your needs. The options available are explained in the following.
Options in Edit
The scope of options determines how the options are applied
Application Configuration specifies global scope. This means that if you define options for it, all SOM configuration you create get these options by default.
Active SOM Configution specifies options for the currently active SOM configuration only. The options you define here will be saved when you save the SOM but will be reset back to application configuration scope when close the SOM configuration.
With the checkboxes in Scope of effect when applied you can set the scope to which your newly set options apply.
SOM Creation Process
Level of abstraction basically is to be set based on how knowledgeable you are on SOMs.
Full means that the SOM analyzer attempts to help you as much as possible
Semi supposes you already know the basic concepts of SOM but still need the wizard approach or simply don't want to fiddle with the parameters.
Manual gives you full control. You probably know what you are doing, anyway...
Skip introduction page in wizard does exactly what it says saving you the trouble of making an extra mouse click each time you start a new SOM.
Visualizations
Window scaling can be set to maximize the display area in UI (Maximize SOM size to window) in case you are processing really large maps or maps having unorthodox dimensions.
Preserve SOM aspect ratio is the default value and typically is to be left at that.
Check visualizion on single click.
Sometimes, it may be handier to check this option if you just need to browse to through different visualizations. Otherwise (by default) the visualizations are turned on (activated) each time you click them.
Show visualization name on visualization area displays the name of the visualization on the top left corner of the visualization area.
Tool Tips
The options here are self-explanatory and you can set them as you see fit.
Analysis
Perform cluster detection analysis upon SOM creation automatically runs the cluster detection analysis each time you create a SOM. This also has the effect that the Cluster Detection visualization gets added to the list of available visualizations.
When done, click the Save Options and Close to make sure options are saved right away.
Keyboard Shortcuts
SOM Analyzer supports various key shortcuts to further enhance its usability. Most of the key shortcuts are associated with their respective menu items and also shown in the menu. However, there are some hidden shortcuts not visible anywhere in the application. They are the following:
Conclusion
I have my made my best effort to make the SOM Analyzer as efficient a SOM tool as possible. At the time of coding it, I made significant research into how the application could also be made as user friendly as possible and designed the application even for the users having no knowledge on SOMs whatsover. How well did I manage to get the best of both worlds? Well, it is up to you to say...
If you have any questions whatsoever or simply want to file in a bug report, please do not hesitate to contact me at: petri.hassinen@gmail.com. I promise to get back to you at the earliest convenience.
Thank you!