Say what you are going to investigate. Remember at the beginning you don’t know whether there is a relationship so this needs to be implied in the question. eg I wonder of there is a relationship between.....
- Take it further by research and think why would you do this investigation – who would be interested in this and why? Do you already know what result you are likely to get?
- If comparing with another explanatory variable you need to state that you are going to do this in your question eg I will investigate relationship between windspeed (explanatory variaible) and paint drying time (response variable) and compare this to humidity (explanatory variable) and paint drying time (response variable). NOTE: it is not essential to compare more than one variable so only add this to your purpose if it makes sense to.
What do you notice? Positive/negative – What tells you this (positive as - as ___ increases so does _____, negative as – as ___ increases ___ decreases). Does it look linear, nonlinear – why?
Strength of relationship and what tells you this. If Linear you could put the gradient in context ( gradient comes before x in the trendline equation). Anything unusual eg outliers, groupings. What are possible reasons? Is the scatter weak because of another variable? Is there an outlier? Is the outlier an error or a valid piece of data?. Is there any research info that backs up what you see that could deepen the discussion eg explain the relationship or unusual features?
Does the trendline represent the data well? Eg are there any gaps along trendline is data scattered evenly either side? Are outliers having influence over where the trendline is being placed therefore not representing parts of the data very well(if they are errors than they should be removed so the trendline better represents the data – this would be done for improvements)
-r does it back up what you see?
- Residual analysis (only if linear) – Does the residual plot back up your original ideas of a linear relationship? Eg is there fairly constant scatter throughout the graph – along the x axis or is there some pattern to it which points toward something different?
-Is the y-intercept reasonable? Could data come close to this or continue further on? Could this trendline be used to predict far beyond the end of the data? If not why not (Research)? How would it change? Would another model/trendline be better? Should there be a piecewise graph?
We can never say that the explanatory causes the response to change! Research and think about lurking variables – things that are linked to the explanatory that are likely to be the cause for the change. What would be needed to be kept constant if the explanatory variable was to be said to be the cause of the change in the response variable. Is there another variable in the dataset that may be a lurking variable?
One interpolation (inside the data) and one extrapolation(outside the data) – show calculation and put into context. How accurate do you believe your predictions are? What makes you think this? Any research? ONLY PREDICT WHAT IS REASONABLE!!
If you want -
Compare another explanatory to the one you have done analysis with. Is it better at predicting the response variable? Eg is the strength of relationship better? Is r r squared closer to 1?
Improvements to the model- would a piece wise or some other model be better in retrospect? If outliers are removed (DO NOT DO THIS UNLESS THEY ARE NOT POSSIBLE!!) does the trendline better represent the data? Use eyes and also r. Are the groups in the data that would be good to investigate further??
Answer your original question. Summarise your findings.
-You could add in here whether these findings could be used in other areas or are they just relevant from the area they were collected from.
-Improvements -Are there any obvious groups? Eg boys and girls which should have been investigated separately (eg two graphs and the relationship of each investigated). This could deepen the investigation.
-Have you made any assumptions? Is there any information about how the data was collected? Have other variables remained constant? What would be needed to make the data collected any good?
-Relevance and usefulness – Are the results useful to anyone? Think about making predictions are they good or poor. Are they what you expect? Who would use them and how widely could they be used?