Statistics Basic Definitions

Statistics Definition (Verbatim from Stat Trek)

Statistics is a discipline that allows researchers to evaluate conclusions derived from sample data. In practice, statistics refers to a scientific approach used to:

Collect data.
Interpret and analyze data.
Assess the reliability of conclusions based on sample data.

For example, a researcher may ask which medicine is more effective - Drug A or Drug B. The researcher could use the tools of statistics to design a study to compare the two drugs. Using statistical methods, the researcher might conclude with 95% confidence that one medicine was superior to the other.

Variable Definition (Verbatim from Stat Trek)

Recall, in algebra, a variable represents an unknown value.

In statistics, a variable has two defining characteristics:

A variable is an attribute that describes a person, place, thing, or idea.
The value of the variable can “vary” from one entity to another.

For example, suppose we let the variable x represent the color of a person’s hair. The variable x could have the value of “blond” for one person, and “brunette” for another.

When the value of a variable is the outcome of a statistical experiment , that variable is a random variable.

Types of Variables

Variables can be either numeric or categorical. Numeric variables can be either continuous or discrete. Categorical variables can be either ordinal or nominal. See the graphic in: https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+variables

Examples of the different types of variables:

continuous numeric variable: a variable that takes real number values like 3.54, e.g., gpa
discrete numeric variable: a variable that takes discrete (or integer) values like 17, 9, 30, e.g., day of month
ordinal categorical variable: a variable that takes categorical values like small, medium, large that can be logically ordered, e.g., pizza size
nominal categorical variable: a variable that takes categorical values like black, brown, blue, green that cannot be logically ordered, e.g., eye color

Data collected for a numeric variable are quantitative data.

Data collected for a categorical variable are qualitative data.

Measures of Center

mean: the mean is the average of values in a data set.
median: the median is the middle of the set of ordered values.
mode: the mode is the most common value in a data set.

Measures of Position

Minimum (min): the smallest value in a (rank) ordered data set.
Median (aka Second Quartile Q2): the middle value in a (rank) ordered data set.
Maximum (max): the largest value in a (rank) ordered data set.
First Quartile (Q1): the middle value between the minimum and median in a (rank) ordered data set.
Third Quartile (Q3): the middle value between the median and maximum in a (rank) ordered data set.

Measures of Spread or Variability

range: maximum value - minimum value
interquartile range (IQR): spread of the middle fifty percent of the data (Q3 - Q1)
standard deviation: (from Stat Trek) a numerical value used to indicate how widely individual values in a group vary. If individual values vary greatly from the group mean, the standard deviation is big; and vice versa.

Bar Chart vs. Histogram (from Stat Trek)

Bar Chart: A bar chart is made up of columns plotted on a graph.
- The columns are positioned over a label that represents a categorical variable.
- The height of the column indicates the size of the group defined by the column label.

Histogram: A histogram is made up of columns plotted on a graph. Usually, there is no space between adjacent columns.
- The columns are positioned over a label that represents a continuous, quantitative variable.
- The column label can be a single value or a range of values.
- The height of the column indicates the size of the group defined by the column label.

Page updated

Google Sites

Report abuse