In the past we have focused mainly on data from a single variable. Using the data we were able to make a very rough estimate of the value of a member of the population using an average.
Heights of 200 Year 10 students from Census at Schools database.
In this topic we turn our attention to data where we have two pieces of information about each unit we measure. A unit may be an individual or an object.
The information we gather about our units are called attributes. The attributes that we gather about each unit are more commonly called variables. We will be investigating the relationship between the two variables.
For an individual these variables may be their height, income, gender, etc. For a tree it may be its girth or type. Research often produces data with two pieces of information about each object or individual. Data which has two variables to be studied is called bivariate data.
Examples could be:
· the age of kauri trees and their diameter
· house prices and population density
· radioactivity in the ground water and distance from Fukushima nuclear power plant.
A Scatterplot is a tool for displaying two pieces of information about an individual on one graph.
Somewhere in your past you might have investigated the relationship between height and foot length of the people in your class. You probably drew a scatter plot similar to the one below. You may have concluded that students with longer feet tend to be taller.
Now we want to look further into the relationship between two variables from the same unit. We will be investigating only data which is quantitative. The variable on the vertical axis must be measured (continuous), the variable on the horizontal axis may be counted or measured.
The purpose of drawing a scatterplot is usually to establish whether or not a relationship exists between the two variables.
This unit is about identifying relationships between variables and using these relationships to make predictions.
ie
Is there a correlation between the two?
If there is a correlation, how strong is it?
A strong correlation would allow us to be able to predict one given the the other.
Interesting correlations and questions
Bivariate data
(Univariate vs Bivariate Data Analysis)
- data that has two variables.
egs:
Height and Armspan
GDP and Infant Mortality Rates
House Price and Section Size
Scatterplot
A graphical representation of these two variables.
Allows any relationships to be easily seen.
Correlations occur between the two variables.
These are determined as strong or weak correlations on a scale of 0 - 1.
The Correlation Coefficient
In some instances of bivariate data, one variable influences or determines the other variable.
A Predictor variable (x-axis) can be used to predict the value of the other variable - the Response variable (y-axis).
Predictor - Independent
Response - Dependent
A model of the relationship can be described using regression analysis. (i.e an equation relating the two variables is expressed).
Summary of Scatterplot:
armspan = 0.84 * height + 15.39
Correlation = 0.35