In this project you will explore the relationship between two quantitative variables using linear regression analysis. You will investigate whether a linear relationship exists, interpret key statistical measures, and present your findings in a clear, professional manner.
If you choose OWID, follow this video for a tutorial to transfer data to Stapplet for analysis.
CORGIS: The Collection of Really Great, Interesting, Situated Datasets
Click through the Cricket Thermometer slideshow below.
See some student presentations here.
the height of a swimmer has an effect on their 50 meter sprint time
the grades one receives in high school can be a good predictor for how one does in college
the salary of an individual has a strong correlation with their level of happiness
Does the number of hours studied (X) predict test scores (Y)?
Is there a relationship between average temperature (X) and ice cream sales (Y)?
Does the GDP of a country (X) predict life expectancy (Y)?
Every few days each part will be due sequentially. Please paste a link to your data sources and presentation here. No late work will be accepted for credit, but must be completed at the time of the presentation.
Explore a question that investigates the relationship between two quantitative variables. You can use any reliable dataset.
Approval: Run your idea by me or by your classmates before moving forward. Once approved, continue to the next step.
Source: Use a dataset with at least 30 observations. Your dataset must have two quantitative variables.
Documentation: Clearly describe where your data came from, how it was collected, and include a link to the source.
Univariate Analysis: Start by exploring each variable separately.
Calculate key statistics like mean, median, standard deviation, and range.
Create histograms or boxplots to visualize the distributions.
Correlation: Calculate and interpret the correlation coefficient (r) to determine the strength and direction of the relationship between your variables.
Scatterplot: Create a scatterplot to visualize the relationship between your two variables.
Least Squares Regression: Fit a least squares regression line to your data.
Report the equation of the line.
Add the regression line to your scatterplot.
Residual Analysis: Create a residual plot to assess the fit of your model. Investigate any patterns in the residuals that suggest a poor model fit.
r² Value: Report and interpret the r² value, which tells you how much of the variability in Y is explained by X.
Slope Interpretation: Explain what the slope of your regression line means in the context of your data.
Y-Intercept Interpretation: Discuss the meaning (or lack of meaning) of the y-intercept.
Goodness of Fit: Evaluate whether your model meets the assumptions of linear regression:
Linearity
Independence
Equal spread (homoscedasticity)
Normality of residuals
Predictive Power: Use your model to make predictions and interpret their real-world significance.
Model Evaluation: Summarize whether your model provides a good fit for the data. Discuss the strength, limitations, and potential issues with your model.
Real-World Implications: Discuss the broader implications of your findings. How do they answer the research question you posed?
Future Directions: Provide three suggestions on how future studies could improve upon your analysis. For example, suggest collecting more data, using different variables, or employing more advanced statistical techniques.
Format: Use the provided Presentation Slide Template or create your own organized, professional slides.
Structure:
Introduction: Research question, data description, and variables.
Data Exploration: Univariate analysis, scatterplot, and correlation.
Regression Model: Line equation, residual plot, and interpretation.
Discussion: Model evaluation, implications, and future directions.
Graphics: Include at least three visual aids: a scatterplot, regression line, and a residual plot.
Conclusion: Present a clear summary of your findings and the next steps for further research.
the height of a swimmer has an effect on their 50 meter sprint time
the grades one receives in high school can be a good predictor for how one does in college
the salary of an individual has a strong correlation with their level of happiness
Here's a list of all of the required elements for your presentation in a list: Presentation Grading Rubric, or if you prefer, below is a written description of these elements.