Linear regression models are statistical techniques used to determine the apparent relationship between two variables, such as marketing expenditure and sales revenue or seasonal impacts on the demand for certain goods and services. The DP Business Management syllabus specifies the following three techniques for simple linear regression analysis as part of the Toolkit. Each step is progressive and builds on the previous technique.
Scatter diagrams
Line of best fit, and
Correlation / Extrapolation
Being able to determine correlation helps to improve business decision making and strategic planning.
1.Scatter Diagrams
A scatter diagram is a visual statistical tool used to show the relationship or correlation between two variables, such as marketing expenditure and sales revenues or consumer income and household expenditure levels. This is done by plotting the values of each variable on a different axis, such as the value of marketing expenditure on the horizontal axis (the x-axis or independent axis), and the value of sale revenue on the vertical axis (y-axis or dependency axis).
In theory, the more money a person earns, the more they spend per time period. Market research can be used to determine the extent to which there is a strong correlation between income and expenditure levels. In the scatter diagram below, each dot represents a respondent, with their income shown along the x-axis and their expenditure plotted on the y-axis, per time period.
The graph can then help to determine the degree to which one variable impacts the other. A correlation exists if the scatter diagram shows the two variables being measured are related.
A positive correlation exists if the two variables move in the same direction, such as an increase in advertising expenditure causing a subsequent rise in a firm's sales revenue.
The opposite is true for a negative correlation, such as the demand for warm clothing items is expected to fall during hotter periods of the year.
No correlation exists if there is data set suggests there is no clear or obvious relationship between the two variables being measured or shown in a scatter diagram. This means the relationship between the two variables is inconclusive or unrelated, such as the demand for coffee and the price of umbrellas.
The main advantage of using scatter diagrams is that they can show patterns and therefore correlations in a visual way. The main limitation, however, is that the tool does not reveal causation (the reason or reasons behind the relationship between the two variables under investigation).
2. Line of best fit
A line of best fit is a statistical technique used to show the relationship or correlation between variables on a scatter diagram. The line of best fit (also known as the regression line) is drawn through the different data points plotted on a scatter graph that evens out variations in the data set.
The line of best fit is used to identify any underlying patterns or relationship between the variables being investigated. However, a line of best fit can only be established if there is positive or negative correlation between the two variables in a scatter diagram.
A strong correlation exists if all of the data points are very close to the line of best fit. This can be a strong positive or a strong negative correlation.
A positive correlation means the two variables being measured move in the same direction, but the data points are not necessarily close to the line of best fit.
A negative correlation exists if the two variables move in the opposite direction, but the data points are not necessarily close to the line of best fit.
There is said to be no correlation between the two variables being measured if a line of best fit cannot be determined.
Hence, a line of best fit is used to indicate the strength of the correlation.
Simple linear regression can also be used to show the trend in a data set, such as a firm's sales revenue over a prolonged period of time. However, as with all statical techniques, it is important to consider what the line of best fit does not show, as it is only an approximation from the given data set. Any readings or interpretations from the line of best fit will be estimations.
Teacher only box
Note to teachers:
For IB assessment purposes, HL students will not be expected to calculate the line of best fit. Although the line of best fit can be derived by using a statistical equation, only an approximation is required. You can do this by looking at the data and drawing a straight line that seems to best show the relationship between the data in the scatter graph.
For reference only, the equation to find the line of best fit is:
Y` = bX + A
where,
Y` denotes the predicted value
denotes the slope of the line
X denotes the independent variable, and
A is the y-axis intercept.
Once the scatter points are plotted on a diagram, draw a straight line that passes through as many data points as possible. The line should be as close as possible to all points. For points not on the line of best fit, there should be approximately half the data point above the line, and half below the line.
Top tip!
A key limitation of a correlation shown by a line of best fit is believing there is a causal link between the two variables under investigation, when, in fact, they are not actually related. Always think critically about the data being presented.
3. Correlation / Extrapolation
Correlation is used to determine the relationship between data sets. It is a statistical process of establishing a relationship (or connection) between two or more variables. For example, research from the Chinese University of Hong Kong's medical faculty has found that Internet gaming disorders and mental health problems are significantly correlated. In the world of business, correlation is widely used in financial analyses and to support strategic decision making.
Examples of correlation, which can be determined by using simple regression tools, may include:
As the weather gets warmer, the demand for ice cream increases.
The more it rains, the higher the demand for umbrellas.
The more people visit the cinema or movie theatre, the greater the spending on popcorn.
The less a firm spends of marketing, the fewer the number of customers it will have.
Spending more on research and development (R&D) leads to more innovation.
The longer a member of staff works at an organization, the higher their chances are of being promoted to a higher rank.
The greater the spending on staff training the development, the more productive workers become.
An increase in average incomes tends to lead to a higher level of consumer expenditure.
Extrapolation is a statistical forecasting technique that makes future predictions of sales (in units or dollars) based on trends identified from using past data. It works by using a line of best fit (for a particular data set) and extending this line to make predictions, such as future sales revenue.
Example of extrapolation of sales data
Note that extrapolation is only effective if the relationship between the dependent and independent variables is linear (i.e. a clear line of best fit can be established).
Other examples of how managers might use simple linear regression to support problem solving and decision making include:
Analyzing survey data to understand indicators such as the degree of customer satisfaction and product preferences (see market research).
Assessing business risks to support decision making (see contingency planning and crisis management).
Building linear regression models for machine learning to support problem solving (see artificial intelligence).
Making sales estimates at different times of the year (see sales forecasting).
Predicting how changes in price is likely to impact consumer behaviour (see price elasticity of demand).
Advantages and disadvantages of simple linear regression
Advantages of simple linear regression
The potential advantages of using simple linear regression include:
Predictive analytics - These statistical tools enable businesses to predict and therefore prepare for risks and opportunities. They are often used by business analysts to make forecasts of future outcomes.
Enhances decision making - Managers and entrepreneurs rely on quantitative data and financial analyses to aid strategic decision making. Simple linear regression techniques enable such analyses to have greater levels of accuracy and trustworthiness, thereby supporting businesses in testing various hypotheses and developing more appropriate strategies.
Reveals new business opportunities - Simple linear regression analysis, such as correlation, can help to reveal new business opportunities that might not have otherwise been available or may have gone unnoticed by decision makers as the information was unavailable. Instead, the tools enable decision makers to gain insights into new business opportunities that can be put to strategic use.
Reduces errors and risk associated with business strategy - Simple linear regression techniques enable business people to test theories, strategies, and hypotheses in order to determine if they are likely to be feasible and successful. Gaining access to the right data helps to have fewer errors, and so reduces risks. As a quantitative BMT, simple linear regression is based on evidence to support decision making, rather than decision makers relying purely on past experiences and/or their own intuition.
Improved management - Overall, simple linear regression techniques help managers and entrepreneurs to manage their businesses more efficiently, such as resource allocation, employee productivity, and budgeting.
Limitations of simple linear regression
The potential limitations of using simple linear regression include:
Cause versus effect - Being able to establish a correlation between two or more variables does not necessarily enable managers to determine the causes of the relationship or connections. For example, there are numerous factors that can cause an increase in the demand for a firm's goods or services, despite its advertising expenditure. Linear regression does not necessarily enable managers to know how a change in one variable causes a change in another variable. See Box 1 for some real-world examples.
Such statistical techniques can be both time consuming and expensive to conduct. A large and representative data set is required to generate meaningful results.
Linear regression is sensitive to outliers. Outliers of a data set refer to anomalies, irregularities, and extreme values that deviate from the other data points. Outliers can drastically change a line of best fit and any corresponding correlation may have a low degree of accuracy.
The past is not indicative of the future - Just because a correlation might be established from a data set does not mean that the trend will continue into the future. For example, the outbreak of the COVID-19 pandemic - which no one could have predicted - caused major havoc to all industries across the world. The chart below shows a positive sales revenue trend for cinemas in the UK, although the coronavirus outbreak soon put an end to that.