● Parameter: A numerical descriptive measure of a population (µ, σ)
○ Population: The variable is from every individual of interest
● Statistic: A numerical descriptive measure of a sample (x̅, S)
○ Sample: The variable is from only some of the individuals of interest
● Center: Mean, Median, Mode
○ Mean (µ and x̅): Average of the data
○ Median: Value in the middle after arranging the numbers in order
■ Use the formula (n + 1)/2 to find which value the median is, where n is the amount of numbers you have
○ Mode: Number that repeats the most
○ The mean is not resistant to outliers but the median is
■ Outlier: A value that seems to fall outside the data (see boxplots)
● Spread (Variability): Standard Deviation, Variance, Range, Interquartile Range
○ Standard Deviation (σ and S): How far one data point is from the mean
■ The larger the standard deviation, the more spread out the data is (vice versa)
○ Variance (σ2 and S2): Standard deviation squared (used in other formulas)
○ Range: Highest value minus lowest value (max - min)
○ Interquartile Range (IQR): Third quartile minus first quartile (Q3 - Q1)
● Shape: Approximately Normal, Skewed to the Right, Skewed to the Left
○ Approximately Normal: Mean = Median
○ Skewed to the Right (Positively Skewed): Mean > Median
○ Skewed to the Left (Negatively Skewed): Mean < Median
● Show a brief summary of the data
● When drawing boxplots:
○ Ensure your scale is consistent
○ For one variable, only the x-axis must be labeled. For two variables, the x-axis and y-axis must be labeled
○ Parallel box plots are displayed in the same graph, one above the other (this is where you must label the y-axis)
○ Each section represents 25% of the data
○ The distribution is skewed towards the longer box/whisker
○ Outliers are marked with asterisks (*)
○ Find outliers using the formula h = 1.5(Q3 - Q1). Any number outside the range Q1 - h and Q3 + h is considered an outlier
○ The modified minimum and maximum are the smallest and largest numbers that are not outliers
● Used to display quantitative data, generally from small data sets
● Give exact data
● Shows outliers, gaps, and clusters
● The ‘ones’ place is always on the right
● Do not put any commas (spaces only)
● Always include a key at the side or bottom
● Consists of columns plotted on a graph
● The columns sit over a label that represents the categorical value (qualitative variable)
● The height of the column indicates the size of the group
● Leave space between the bars (data is categorical, not continuous)
● Skewness cannot be applied to bar charts
● Consists of columns plotted on a graph
● Usually no space between adjacent columns
● The columns sit over a label that represents the numerical value (quantitative variable)
● The columns are placed in the middle of each number on the graph
● The height of the column indicates the size of the group
● When histograms are large on opposite ends, the standard deviation is larger than if it were symmetric or larger in the middle
● Histograms may be approximately normal or skewed
● Consists of dots plotted on a graph
○ Each dot represents a specific number of observations from a set of data
○ The dots are stacked in a column over a category.
○ The height of the column represents the relative or absolute frequency of observations in that category
● Dot plots may be qualitative or quantitative. Dot plots may only be described in terms of skewness if they represent quantitative data