Role of data visualisation in machine learning

Introduction

Laymen explanation

Being able to quickly visualize your data samples for yourself and others is an important skill both in applied statistics and in applied machine learning.

But the moment, data is more than 3-D, it will be challenging to visualise such data. If you are interested to explore approaches to represent multi-dimensional dataset, then this document helps.

Technical explanation

In a data, some points are closer and some points are far. When we look at data, we understand this behaviour very well. So, Data visualisation is important to understand various aspects including shape. It helps in following aspects

More rapid problem solving via faster access to business insights
Identification of relationships and patterns
Pinpoint and track emerging trends
Ease of communication between parties
A better understanding of operational and business activities
Direct interaction with data

Visualisation using Dimension reduction

Dimension reduction to 3-D can enable data visualisation.

Criteria for dimensionality reduction

However topology preservation while dimension reduction is important. Interdistance between data is one such topology.

Advantage of topology preservation

Preservation of topological properties facilitates data interpretation.

Analysing model suitability

Suitable models

The self-organizing map (SOM), as an example learning algorithm [16] for topology-preserving analysis of multi-dimensional data

Not suitable models

PCA does dimension reduction, however it doesn't guarantee inter-distance topology preservation.

Time series data visualisation

A time series can have components like trend, seasonality, cyclic and residual. A data visualisation tool should consider all of them.

ACF is one such model which considers all above mentioned components while finding correlations hence it’s a ‘complete auto-correlation plot’. ACF describes the autocorrelation between an observation and another observation at a prior time step that includes direct and indirect dependence information. This article helps to understand in detail.

Point to remember

keeping below data visualisation tools challenges in mind during any data visualisation project can help to minimise their impact.

Tools show but they don’t explain, sometimes assuming that viewers understand more than they do
Different users can draw different conclusions
Implicit bias of whoever is managing the data – no matter how small
A false sense of security – sometimes graphs aren’t enough to tell the whole story and we don’t always take stock of this

Reference

https://youtu.be/q8gVpKl1f-4?t=2112

https://youtu.be/q8gVpKl1f-4?t=3165

https://www.intechopen.com/books/applications-of-self-organizing-maps/using-self-organizing-maps-to-visualize-filter-and-cluster-multidimensional-bio-omics-data

https://images.app.goo.gl/PbRTLKwLbpKA5bhF7

https://machinelearningmastery.com/data-visualization-methods-in-python/

https://research.aimultiple.com/data-visualization/

https://medium.com/analytics-steps/introduction-to-time-series-analysis-time-series-forecasting-machine-learning-methods-models-ecaa76a7b0e3

https://towardsdatascience.com/significance-of-acf-and-pacf-plots-in-time-series-analysis-2fa11a5d10a8

https://machinelearningmastery.com/gentle-introduction-autocorrelation-partial-autocorrelation/

https://images.app.goo.gl/FzBLoAMamYZQ9u8G8

Page updated

Google Sites

Report abuse