U4: Graphical Displays

Definitions

Distribution

Box Plots

Stemplots

Bar Chart

Histogram

Dot Plot

Definitions

● Parameter: A numerical descriptive measure of a population (µ, σ)

○ Population: The variable is from every individual of interest

● Statistic: A numerical descriptive measure of a sample (x̅, S)

○ Sample: The variable is from only some of the individuals of interest

Distribution

● Center: Mean, Median, Mode

○ Mean (µ and x̅): Average of the data

○ Median: Value in the middle after arranging the numbers in order

■ Use the formula (n + 1)/2 to find which value the median is, where n is the amount of numbers you have

○ Mode: Number that repeats the most

○ The mean is not resistant to outliers but the median is

■ Outlier: A value that seems to fall outside the data (see boxplots)

● Spread (Variability): Standard Deviation, Variance, Range, Interquartile Range

○ Standard Deviation (σ and S): How far one data point is from the mean

■ The larger the standard deviation, the more spread out the data is (vice versa)

○ Variance (σ2 and S2): Standard deviation squared (used in other formulas)

○ Range: Highest value minus lowest value (max - min)

○ Interquartile Range (IQR): Third quartile minus first quartile (Q3 - Q1)

● Shape: Approximately Normal, Skewed to the Right, Skewed to the Left

○ Approximately Normal: Mean = Median

○ Skewed to the Right (Positively Skewed): Mean > Median

○ Skewed to the Left (Negatively Skewed): Mean < Median

Box Plots

● Show a brief summary of the data

● When drawing boxplots:

○ Ensure your scale is consistent

○ For one variable, only the x-axis must be labeled. For two variables, the x-axis and y-axis must be labeled

○ Parallel box plots are displayed in the same graph, one above the other (this is where you must label the y-axis)

○ Each section represents 25% of the data

○ The distribution is skewed towards the longer box/whisker

○ Outliers are marked with asterisks (*)

○ Find outliers using the formula h = 1.5(Q3 - Q1). Any number outside the range Q1 - h and Q3 + h is considered an outlier

○ The modified minimum and maximum are the smallest and largest numbers that are not outliers

● Used to display quantitative data, generally from small data sets

Stemplots

● Give exact data

● Shows outliers, gaps, and clusters

● The ‘ones’ place is always on the right

● Do not put any commas (spaces only)

● Always include a key at the side or bottom

Bar Chart

● Consists of columns plotted on a graph

● The columns sit over a label that represents the categorical value (qualitative variable)

● The height of the column indicates the size of the group

● Leave space between the bars (data is categorical, not continuous)

● Skewness cannot be applied to bar charts

Histogram

● Consists of columns plotted on a graph

● Usually no space between adjacent columns

● The columns sit over a label that represents the numerical value (quantitative variable)

● The columns are placed in the middle of each number on the graph

● The height of the column indicates the size of the group

● When histograms are large on opposite ends, the standard deviation is larger than if it were symmetric or larger in the middle

● Histograms may be approximately normal or skewed

Dot Plot

● Consists of dots plotted on a graph

○ Each dot represents a specific number of observations from a set of data

○ The dots are stacked in a column over a category.

○ The height of the column represents the relative or absolute frequency of observations in that category

● Dot plots may be qualitative or quantitative. Dot plots may only be described in terms of skewness if they represent quantitative data

Page updated

Google Sites

Report abuse