By the time you are done with this section, you should be able to:
Provide the general goal of bivariate data analysis.
Sketch a typical data matrix that would be used in a bivariate analysis.
Identify the two basic tools used with bivariate analyses.
Generally distinguish correlation from regression analyses.
List the assumptions regarding the type of measurement error that pertain to regression analyses.
Define the term "homoscedasticity."
State the basic assumption regarding the distribution of data values appropriate to a correlation analysis.
List which SAS procedures produce plots.
Explain the use of the UNIFORM option in PROC PLOT.
Give the three basic forms for plot requests in PROC PLOT and distinguish between them.
Describe what will happen when two symbols occupy the same print location.
Indicate what the letters A, B and C would mean on a plot in which SAS is choosing the data symbols.
Distinguish whether the first variable in a plot request corresponds to the X-axis or the Y-axis.
Show how to get a plot where the data symbols correspond to the values of a third variable and describe what will happen if the values for this variable do not have a unique first character.
Write a PROC PLOT statement that will put two different sets of data on the same plot, such as the observed and predicted values of a variable.
Give the name for the abbreviation "GLM."
Identify some of the uses of PROC GLM.
Construct a MODEL statement for a PROC GLM in which a regression relationship is being examined for HEIGHT (as the independent variable) and AGE (as the dependent variable).
Describe the role of the OUTPUT statement in PROC GLM and give an example of its use.
Sketch a linear least squares regression line that has the equation:
LENGTH = 2.0 + 0.5 DIAMETER
Explain the term "least-squares" as it applies to the type of regression line fit in PROC GLM.
State whether a linear, least-squares regression line goes through the point corresponding to the mean X and mean Y values.
Give the range of values for the product moment correlation coefficient (r).
Identify the relationship between the correlation coefficient and the coefficient of determination. o Provide the symbol is generally used to mean the coefficient of determination.
Define the role of the PROB>F value given in PROC GLM and give an example of how it can be used.
Discuss the reason for examining the Anscombe examples carefully.
Draw an example of a specification error.
Describe what is meant by an outlier and sketch what one would look like in a bivariate data plot.
Show how a high-leverage point can influence the direction of a leastsquares regression line.
Discuss what you would do if you encountered an outlier.
Identify what would likely be your first approach to correcting a specification error that clearly indicated you had a non-linear relationship.
Give an example of how you would do a log transformation of a Y variable and then use this in an analysis with PROC GLM.
Distinguish between an allometric, power and learning model.
Identify what is meant by "Occam's Razor" and describe why it is an important philosophical point related to trying to improve the fit of an equation.
Define what is meant by the term "residuals" relative to least-squares regression analysis.
List what general characteristics you are looking for in a residuals plot and sketch how they might appear in a residuals plot.
Demonstrate how to take the PROC GLM output which is being used for a fit to an exponential equation and transform it for use in the equation that is given in the form:
Y = a * ebX
Write the general form of the quadratic equation.
Describe how to analyze a typical data matrix that has bivariate data to get a quadratic equation.
Show how you would plot the results from fitting a quadratic equation to confirm the correspondence between your data values and the predicted values.
Identify the values in a PROC GLM run for a quadratic curve that you would examine to determine if it is an improvement over a linear curve.
Take the values from a PROC GLM run and construct a proper quadratic equation.
Describe the general process of data analysis, identifying the major analysis phases.
Give some reasons that progress may become discouragingly slow during the process of data analysis.
Discuss the role of knowing the characteristics of the data you are analyzing relative to the choice of statistical analysis.
Explain how you would handle a new question that emerges during the process of analyzing a complex body of data.
Describe how to dispose of data and analysis procedures after you are done analyzing your data.