By the time you are done with this section, you should be able to:
Identify some of the major problems facing typical researchers in their use of statistics, including the choice of statistical tests, running the statistical analysis procedures, and interpreting the results.
Describe the "tool box" analogy to statistical analysis procedures.
List the major statistical analysis tools and the important data manipulation tools.
Explain the general way in which a beginner may identify the appropriate statistical tools for a particular analysis problem.
Identify the primary limitation in relying on older analyses as a way to get advice on what statistical tests should be run.
State the two general tasks of statistical analysis.
Define: measurement, variable, and categorization variable.
Give examples of data matrices that are appropriate for univariate data analysis that span the range from the most simple to the most complex.
Identify approximately the maximum number of values that would be appropriate fora categorization variable in a data matrix used with univariate analyses.
Describe the general functions of the following SAS univariate analysis procedures: MEANS, UNIVARIATE, CHART and FREQ.
Show an example of how the DATA= option (common to all analysiS procedures) would have practical use.
Give an example of the use of the MAXDEC option in PROC MEANS and identify when it would likely be used.
Use the code names for the statistical variables to control what information is printed by PROC MEANS.
Explain the role of the BY statement with the univariate analysis procedures and the requirements for its use, including data matrix characteristics and other procedures.
Provide an example that shows why a VAR statement should be used with PROCMEANS.
Discuss the role of the OUTPUT OUT= statement in PROC MEANS.
Construct a PROC UNIVARIATE statement that generates information on the normality of a variable and provides a graphic protrayal of its frequency distribution.
Find the name of the variable being analyzed, its descriptive statistics, descriptors of departure from normality, and test for normality in PROC UNIVARIATE.
Identify the role of the FORMAT statement in PROC CHART.
Sketch the differences between an HBAR and VBAR plot in PROC CHART.
Give examples of when the DISCRETE option should be used and when it would be inappropriate.
Use the proper TYPE= option to produce a percentage bar chart instead of a frequency bar chart.
Show how TYPE= and SUMVAR= are used to produce a bar chart that plots mean values.
Provide an example of a data matrix that would be appropriate for a bar chart that uses the GROUP= option. Do this for the SUBGROUP= option also.
Draw two charts that clearly distinguish the difference between the GROUP= and SUBGROUP= options.
Describe whether there is any situation where both GROUP= and SUBGROUP= might be used in the same bar chart. o Identify the differences in the types of information provided with VBAR and HBAR plots.
State what information is contained on the "base axis" of a bar chart and what information is generally given on the other axis.
Discuss the role of the FORMAT statement in PROC FREQ and identify its importance.
Show example TABLES statements that are used to construct one-way frequency tables, cross tabulations and pages of cross tabulations, properly matching the order of the variables to the axes of the tables.
Provide an example of a TABLES statement for a cross tabulation with the options that exclude all cell information except the frequency values.
Use the LIST option to get a table arranged in the list format (instead of the usual table format).
Identify what information is being given in a table cell from the printout of PROC FREQ.
Locate the name of the variable and its value on a page when it is used to produce a paged cross-tabulation.
Describe what information is given along the margins (right and bottom) of a PROC FREQ table.
Demonstrate how to get names for rows or columns in a table that are different than data values.
Diagnose the problem that has likely occurred when a relatively small data matrix has been used with PROC PRINT and it has produced many, many pages of output.
List a set of procedures that might typically be used in an examination of a univariate data matrix and identify why each is used.
Sketch a stem.leaf plot and describe how to interpret it.
Show how to interpret values in a stem.leaf plot that has a scaling factor listed.
Draw a typical box plot and identify its major characteristics.
Give the particular evidence that you might indicate an outlier value.
Describe why you would examine whether the mean and median overlap in a box plot and show specifically what to examine.
Identify the SAS procedure that is used to produce the stem.leaf and box plots and show what instructions are needed to have them printed.
Describe the relationship between a "normal curve" and a "bell-shaped curve."
Discuss the origin of the term "parametric."
Draw a normal curve and show its various attributes, including the identification of its axes and the location of the mean value and standard deviation.
Distinguish between the Kolmogorov and Shapiro-Wilks tests of normality and be able to identify which one was used in a PROC UNIVARIATE analysis.
Give the interpretation of whether a variable is normally distributed by examining the probability associated with the normality test.
Define and provide examples of skewness and kurtosis.
State whether the use of a bar diagram produced by PROC CHART are adequate to determine whether a variable is normally distributed.
Show an example of a data matrix (including data values) that would be appropriate for a two category statistical test such as the t-test.
Describe the role of PROC UNIVARIATE when two sets of observations are being compared to see if they are significantly different.
Provide examples of pairs of normal curves that are (1) very similar, (2) differ in their means, but not in their variances, and (3) differ in their variances, but not their means.
Identify the role of the F-test in comparing two normal distributions stating how the two normal curves would appear if the F-test showed that they were significantly different.
Discuss the role of the t-test and show how two normal curves would appear if the t-test showed that there was no significant difference.
Indicate the location of the values that are used to determine whether the Ftest and t-test show significant differences and describe how to interpret these values, including the proper order in which these values are examined.
Construct a data matrix appropriate for analysis with PROC TTEST and give the SAS code necessary to use PROC TTEST.
Describe what additional SAS procedures would be useful with PROC TTEST and show the details of where they would most likely be included in a SAS run.
Write the conclusions that might be possible from the results of a PROC TTEST run.
Compare how data matrices would differ between analyses using PROC TTEST and the Duncan and Waller tests in PROC ANOVA.
Show how to take a data matrix that would typically be used for an analysis of multiple categories (with the Duncan-Waller tests of PROC ANOVA) and perform appropriate normality tests.
Discuss the general problem of overlapping groups in the Duncan-Waller analyses.
Give an example of a data matrix and how to use PROC ANOVA in order to perform Duncan and Waller tests on it.
Discuss what might cause a DUNCAN analysis to have a different number of groups than a WALLER analysis and how you might interpret such a situation.