Understanding linear regression is fundamental to analyzing relationships between variables in the real world. This project will equip you with practical tools for analyzing data, interpreting results, and making predictions—all key skills in statistics and beyond!
In this project, you and your table group will explore the relationship between two quantitative variables using linear regression analysis. You will investigate whether a linear relationship exists, interpret key statistical measures, and present your findings in a clear, professional manner. Have a look at the slides on the right for an example.
Develop a comprehensive understanding of linear regression.
Apply your skills to real-world data.
Effectively communicate your findings with visualizations and statistical interpretations.
Need more examples? Have a look at these Presentations from previous classes.
Explore a question that investigates the relationship between two quantitative variables. You can use any reliable dataset or explore Our World In Data (OWID) for inspiration.
Examples of questions:
Does the number of hours studied (X) predict test scores (Y)?
Is there a relationship between average temperature (X) and ice cream sales (Y)?
Does the GDP of a country (X) predict life expectancy (Y)?
Approval: Run your idea by me or by your classmates before moving forward. Once approved, continue to the next step.
Source: Use a dataset with at least 30 observations. Your dataset must have two quantitative variables.
Tools: If you choose OWID, follow this video for a tutorial to transfer data to Stapplet for analysis.
Documentation: Clearly describe where your data came from, how it was collected, and include a link to the source.
Univariate Analysis: Start by exploring each variable separately.
Calculate key statistics like mean, median, standard deviation, and range.
Create histograms or boxplots to visualize the distributions.
Correlation: Calculate and interpret the correlation coefficient (r) to determine the strength and direction of the relationship between your variables.
Scatterplot: Create a scatterplot to visualize the relationship between your two variables.
Least Squares Regression: Fit a least squares regression line to your data.
Report the equation of the line.
Add the regression line to your scatterplot.
Residual Analysis: Create a residual plot to assess the fit of your model. Investigate any patterns in the residuals that suggest a poor model fit.
r² Value: Report and interpret the r² value, which tells you how much of the variability in Y is explained by X.
Slope Interpretation: Explain what the slope of your regression line means in the context of your data.
Y-Intercept Interpretation: Discuss the meaning (or lack of meaning) of the y-intercept.
Goodness of Fit: Evaluate whether your model meets the assumptions of linear regression:
Linearity
Independence
Equal spread (homoscedasticity)
Normality of residuals
Predictive Power: Use your model to make predictions and interpret their real-world significance.
Model Evaluation: Summarize whether your model provides a good fit for the data. Discuss the strength, limitations, and potential issues with your model.
Real-World Implications: Discuss the broader implications of your findings. How do they answer the research question you posed?
Future Directions: Provide three suggestions on how future studies could improve upon your analysis. For example, suggest collecting more data, using different variables, or employing more advanced statistical techniques.
Format: Use the provided Presentation Slide Template or create your own organized, professional slides.
Structure:
Introduction: Research question, data description, and variables.
Data Exploration: Univariate analysis, scatterplot, and correlation.
Regression Model: Line equation, residual plot, and interpretation.
Discussion: Model evaluation, implications, and future directions.
Graphics: Include at least three visual aids: a scatterplot, regression line, and a residual plot.
Conclusion: Present a clear summary of your findings and the next steps for further research.
Clarity of Question: How well-defined and relevant is the research question?
Data Selection & Exploration: Quality and clarity of data exploration, including one-variable and two-variable analyses.
Regression Analysis: Accuracy and depth of interpretation regarding slope, intercept, residuals, and r² value.
Presentation Quality: Organization, clarity, and professionalism in the presentation.
Conclusion: Thoughtfulness of discussion, real-world implications, and suggestions for future research.
the height of a swimmer has an effect on their 50 meter sprint time
the grades one receives in high school can be a good predictor for how one does in college
the salary of an individual has a strong correlation with their level of happiness
Here's a list of all of the required elements for your presentation in a list: Presentation Grading Rubric, or if you prefer, below is a written description of these elements.