Statistics is a discipline that allows researchers to evaluate conclusions derived from sample data. In practice, statistics refers to a scientific approach used to:
Collect data.
Interpret and analyze data.
Assess the reliability of conclusions based on sample data.
For example, a researcher may ask which medicine is more effective - Drug A or Drug B. The researcher could use the tools of statistics to design a study to compare the two drugs. Using statistical methods, the researcher might conclude with 95% confidence that one medicine was superior to the other.
Recall, in algebra, a variable represents an unknown value.
In statistics, a variable has two defining characteristics:
A variable is an attribute that describes a person, place, thing, or idea.
The value of the variable can “vary” from one entity to another.
For example, suppose we let the variable x represent the color of a person’s hair. The variable x could have the value of “blond” for one person, and “brunette” for another.
When the value of a variable is the outcome of a statistical experiment , that variable is a random variable.
Variables can be either numeric or categorical. Numeric variables can be either continuous or discrete. Categorical variables can be either ordinal or nominal. See the graphic in: https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+variables
Examples of the different types of variables:
continuous numeric variable: a variable that takes real number values like 3.54, e.g., gpa
discrete numeric variable: a variable that takes discrete (or integer) values like 17, 9, 30, e.g., day of month
ordinal categorical variable: a variable that takes categorical values like small, medium, large that can be logically ordered, e.g., pizza size
nominal categorical variable: a variable that takes categorical values like black, brown, blue, green that cannot be logically ordered, e.g., eye color
Data collected for a numeric variable are quantitative data.
Data collected for a categorical variable are qualitative data.
mean: the mean is the average of values in a data set.
median: the median is the middle of the set of ordered values.
mode: the mode is the most common value in a data set.
Minimum (min): the smallest value in a (rank) ordered data set.
Median (aka Second Quartile Q2): the middle value in a (rank) ordered data set.
Maximum (max): the largest value in a (rank) ordered data set.
First Quartile (Q1): the middle value between the minimum and median in a (rank) ordered data set.
Third Quartile (Q3): the middle value between the median and maximum in a (rank) ordered data set.
range: maximum value - minimum value
interquartile range (IQR): spread of the middle fifty percent of the data (Q3 - Q1)
standard deviation: (from Stat Trek) a numerical value used to indicate how widely individual values in a group vary. If individual values vary greatly from the group mean, the standard deviation is big; and vice versa.
Bar Chart: A bar chart is made up of columns plotted on a graph.
The columns are positioned over a label that represents a categorical variable.
The height of the column indicates the size of the group defined by the column label.
Histogram: A histogram is made up of columns plotted on a graph. Usually, there is no space between adjacent columns.
The columns are positioned over a label that represents a continuous, quantitative variable.
The column label can be a single value or a range of values.
The height of the column indicates the size of the group defined by the column label.