Multivariate Analysis is defined as a process involving multiple dependent variables resulting in one outcome. This explains that the majority of the problems in the real world are Multivariate. For example, we cannot predict the weather of any year based on the season. There are multiple factors like pollution, humidity, precipitation, etc. Here, we will introduce you to multivariate analysis, its history, and its application in different fields.
Why Multivariate Analysis?
A major advantage of multivariate analysis is that, unlike univariate studies, it takes into account the interdependences between the variables under study and hence is more informative.
However, this also brings in more complexities in the analyses.
In spite of it, most often a multivariate analysis of the m variables is preferred to m separate univariate analyses.
Usually most of the univariate methods can be extended to multivariate methods, with more complexities but better results.
However, there are several multivariate techniques which are peculiar to themselves. These are
either too trivial in univariate studies
or do not arise in univariate problems
Few examples of multivariate data in different fields
Social Science : Characteristics of an individual like (Gender, Age, Nationality)
Climatology : Weather on a particular day like (Minimum temperature, Maximum temperature, Rainfall, Humidity)
Economics : Management of a firm like (Input costs, Production, Profit)
Socio-Demographic : Profile of a country like (GDP, Life expectancy, Literacy rate)
There can be several examples from Health Sciences :
General Health : Health profile of an individual like (Systolic BP, Diastolic BP, Pulse rate)
Pathology : The pathological profile of a patient like (Blood sugar level, Uric acid concentration, Hemoglobin count)
Administrative : Day-to-day administration in a hospital (Admissions, Operations, Discharges, Deaths)
Pharmaceutical : Quantity of sales per day in a pharmacy (Drug A, Drug B, ., Drug Z)
Different aspects of Multivariate Analysis
A major aspect of multivariate analysis is to extend the univariate results to the multivariate set-up
This is mostly done to study the inter-relationships between the variables under study, which are lost in univariate studies. Hence emphasis here is on joint rather than marginal studies.
This may mean the extension of the Binomial or Normal distributions to their multivariate counterparts
It can also mean the extensions of the inference techniques to multivariate data. This can relate both to the estimation and hypothesis testing problems.
A major extension is in the study of cause-effect relationships. The multiple regression model with one response, the ANOVA model and the ANCOVA models can all be extended to their multivariate counterparts where several responses are studied simultaneously.
The question asked here is whether a set of variables have an effect on another set of variables, and if so, how ?
As the model equations are linear in the parameters, the models are broadly referred to as Multivariate Linear Models.
Special cases of this, depending on the nature of the covariates, are Multivariate Regression, MANOVA and MANCOVA .
But there are aspects of multivariate analysis which are unique to itself. Two such broad aspects are
Classification of Individuals
Dimension Reduction
The problem of Classification of Individuals is too trivial for univariate data, while the Dimension Reduction problem does not arise for single variable studies.
Classification of Individuals
At the exploratory stage in the classification problem, the question often asked is
"Can a group of individuals or units be separated into smaller subgroups according to similarity or dissimilarity ? "
The answer is often sought to segregate similar units into homogeneous groups and allow the heterogeneity to be treated as between group variability
The greater this between group variability, the more distinctly defined are the groups.
The technique to do this is known as Cluster Analysis.
Following the formation of such clusters, it is necessary to characterize these clusters. Thus the next natural question is
"Why and how are these clusters different ? "
Such answers are generally provided by Discriminant Analysis, which helps to distinguish between the characteristics of the clusters
Classification techniques are then used to assign new individuals into one of these well-defined clusters.
Very often, a data may contain a large number of variables.
This may make both the analysis complicated and the interpretations difficult.
A question thus asked is
"Whether it is necessary to look at all the variables, Or is it possible to capture the same information through a smaller set of variables ? "
The solutions to this problem are obtained through Dimension Reduction methods.
Dimension Reduction
There are several Dimension Reduction techniques. Four of the important ones are
Principal Component Analysis
Factor Analysis
Canonical Correlation methods
Multidimensional Scaling
Summay of this course
The extensions of univariate inferential techniques to multivariate data
The study of Multivariate Linear Models
The study of Dimension reduction techniques
The study of different Classification techniques
Multivariate Normal Distribution and Related Inference: I