Line of best fit

Line of Best Fit

The most basic type of association is a linear association. Last chapter, we learned about a linear association called a correlation. We talked about how we can use Pearson’s r to describe the linear association between the two variables. This is a singular value. Often times, we also want to define a line that best represents the relationship between the two variables so that we can predict the score of one variable given the score on the other variable. We can do this many ways: a linear relationship can be represented algebraically by the equations used, numerically with actual or predicted data values, or graphically from a plotted curve. (Lines are classified as straight curves.)

Algebraically, a linear equation typically takes the form y = mx + b, where m and b are constants, x is the independent variable, and y is the dependent variable. In a statistical context, a linear equation is written in the form y = a + bx, where a and b are the constants. This form is used to help readers distinguish the statistical context from the algebraic context. In the equation y = a + bx, the constant b that multiplies the x variable (b is called a coefficient) is called as the slope. The slope describes the rate of change between the independent and dependent variables; in other words, the change that occurs in the dependent variable as the independent variable is changed. In the equation y = a + bx, the constant a is called as the y-intercept. Graphically, the y-intercept is the y coordinate of the point where the graph of the line crosses the y axis. At this point x = 0.


The slope of a line is a value that describes the rate of change between the independent and dependent variables. The slope tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average. The y-intercept is used to describe the dependent variable when the independent variable equals zero.


In regression, we attempt to find the line of best fit, which is the single line that best summarizes the linear relationship between the X and Y variable. Although the line of best fit is extremely important, it is still just a line. Let’s make sure we understand how to graph and interpret a line by reviewing the information below.

Slope And Y-Intercept Of A Linear Equation


For the linear equation y = a + bx, b = slope and a = y-intercept. From algebra recall that the slope is a number that describes the steepness of a line, and the y-intercept is the y coordinate of the point (0, a) where the line crosses the y-axis.


These are three possible graphs of y = a + bx.

(a) If b > 0, the line slopes upward to the right. (b) If b = 0, the line is horizontal. (c) If b < 0, the line slopes downward to the right.

PRACTICE 1

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one-time fee of $25 plus $15 per hour of tutoring.

  1. Generate a linear equation that expresses the total amount of money Svetlana earns for each session she tutors.

  2. What are the independent and dependent variables of this equation?

  3. What is the y-intercept and slope?

  4. Interpret the IV, DV, y-intercept and slope using complete sentences.

PRACTICE 2

Ethan repairs household appliances like dishwashers and refrigerators. For each visit, he charges $25 plus $20 per hour of work. A linear equation that expresses the total amount of money that Ethan makes is y = 25 + 20x.

  1. What are the independent and dependent variables?

  2. What is the y-intercept and slope?

  3. Interpret the IV, DV, y-intercept, and slope using complete sentences.

PRACTICE 3

Emma's Extreme Sports hires hand-gliding instructors and pays them a fee of $50 per class as well as $20 per student in the class. The total cost Emma pays depends on the number of students in the class.

  1. What is the equation that expresses the total cost in terms of the number of students in a class?

Note that in these practice problems, you're finding the line of best fit (represented by the equation y = a + bx). It is important to note that we can use the line of best fit to predict the value of y at a given value of x. This is the general idea of regression that we will expand upon later in this chapter.

Answers

Practice 1.

  1. y = 25 + 15x.

  2. X, the number of hours Svetlana tutors, is the independent variable in this equation. Y, the amount of money Svetlana makes, is the dependent variable.

  3. The y-intercept, a, of this line is the point where the line of best fit touches the y-axis. (In other words, the point where y=0.) The y-intercept is 25. The slope, b, is 15. (Note: This is a positive slope.)

  4. Interpretations

    • y-intercept: If Svetlana was to tutor 0 hours, she would make $25 from her one-time fee.

    • IV: Svetlana can control/manipulate the amount of hours she tutors, which will affect how much money she makes.

    • DV: The amount of money Svetlana makes will depend on how many hours she tutors.

    • slope: For everyone 1 hour Svetlana tutors, she earns $15.

Practice 2.

  1. The IV is hours of work and the DV is the amount of money Ethan makes.

  2. The y-intercept is 25, and the slope is 20.

  3. Interpretations

    1. y-intercept: If Ethan was to work 0 hours, he would make $25.

    2. IV: Ethan can control/manipulate the amount of hours he works, which will affect how much money he makes.

    3. DV: The amount of money Ethan makes will depend on how many hours he works.

    4. slope: For everyone 1 hour Ethan works, he earns $20.

Practice 3.

  1. y = 50 + 20x


References

  1. https://courses.lumenlearning.com/introstats1/chapter/linear-equations/

LICENSES AND ATTRIBUTIONS

CC LICENSED CONTENT, SHARED PREVIOUSLY


Data from the Centers for Disease Control and Prevention.

Data from the National Center for HIV, STD, and TB Prevention.