The Tool Box
The general purpose of this discussion is to show you that there is a practical base on which you can build your statistical analysis skills. This should be self-evident: certainly the use of statistics should be a well-organized practice. Yet if you have had any prior experience with statistics, you have probably come to the opposite conclusion. Statistics can be very confusing. You are not alone if you feel uneasy about statistics.
Here are a few observations that are common to many biologists:
Few biologists are comfortable with the technical vocabulary of statistics.
It is rare to find an individual who can clearly choose the right statistical analysis for the job at hand, unless it is a very familiar field.
Two skilled practitioners are likely to give conflicting advice.
Few researchers are able to do the mechanical operations to get through a medium or large-sized analysis with any facility.
Most individuals feel that they are working on the borderline of statistical acceptability. There are statistical assumptions that should not be violated, but it is not clear what these are and how serious the penalty might be for such a transgression.
There is a good base for statistical analyses. The current generation of analysis tools gives us access to this. Statistical analyses are performed with a set of tools. Just like a plumber, the statistician has different types of tools for different jobs and for each general kind of tool, there is some variety to make it better suited to the particular task at hand. For example, a plumber has different kinds and sizes of wrenches, screwdrivers, and hammers in his tool box. A good plumber knows exactly when to use each tool.
At the start, you need to become familiar with the basic types of data analysis tools. Later, you will become skilled at choosing between similar tools (as in whether a small or medium-sized screwdriver is more appropriate for the job).
Here are some of the basic statistical analysis tools that a biologist should have at hand:
Univariate analyses
Bivariate analyses
Analysis of Variance
Multivariate analyses
Non-parametric analyses
These are commonly-used names for the different analysis tools, but they are by no means standard names. Differences in names should not present any problem once you clearly know what each tool does. In practice, there is some overlap between these tools.
You should be aware that some important problems need other statistical tools. So treat this as a good provision list.
There are other analysis tools that must be added to the list to round out your minimum set of data analysis capabilities. These are not traditional statistical tools, but handle equally important aspects of an analysis. These additional tools include:
Data conversion and transformation procedures
Table and report printing procedures
Data matrix sub setting and merging procedures
Sorting procedures
Plotting procedures
Choosing Your Tools
Given that you have an analysis problem to be solved and a set of problemsolving tools, how do you select the correct tool for the job? This is a critical choice. Indeed, you probably should not collect any data unless you are sure that you have a proper analysis tool; this is part of good experimental design.
Eventually, you will be able to look at an analysis problem and specify what tools are required. This is just like learning to identify plants or animals. You know the identity of a set of species due to your past experience with them; other species can be identified using identification tools such as keys, field guides and the advice of experts.
Where do you begin in identifying proper analysis tools?
Now, and perhaps for a long time in your career, you will probably follow the analysis practice of previous studies that had goals similar to yours. This is not a blind shortcut approach, but a recognition that scientific advancement frequently proceeds in small, incremental steps. By emulating analysis techniques used in previous studies, you build on the experience of others and allow a comparison of their results with yours.
Get into the habit of carefully examining the analysis procedures that are common to your field of study. Apply these to your own problem in a critical way if you are convinced that you fall within the constrains of the analyses that are to be used.
Within this general approach, there is one word of caution. As analysis programs become more sophisticated, you are better able to be rigorous in your application of statistics. For example, if you analyzed your data with a calculator, you probably would skip over some of the minor aspects of an analysis procedure and assume that this would not affect your conclusions.
With the powerful analysis programs that you have available, it is relatively simple to check on all facets of the analysis. This will help you make sure that your conclusions are valid. In this sense, you will be doing more rigorous analyses than were practical in the past and are likely to differ in some details from many published studies.
Data Analysis as a Process
The traditional approach to the analysis of data has been very goal directed. For example, you would establish a specific goal such as determining whether two samples had different mean values or not, and then work as directly as possible to test for such a difference. This is still a valid approach, but increasingly, it is being combined with an exploration of the data. Now, you will be inclined to check if you have normally distributed samples, whether there are outliers in your data, and to examine the similarity of the variances for the two samples. These additional steps in the analysis do more than confirm the validity of your larger analysis goal; they often provide important interpretive information that will help you understand what is going on in your experiment.
As a result, you should focus on the process of data analysis and become sensitive to the indicators that alert you to unusual conditions.
The Two Tasks of Statistical Analysis
Statistical tools help you do two sorts of things.
Their first role is to help you build efficient descriptions of your data. What you are trying to uncover using descriptive statistical tools are the general properties that may exist within a set of observations. If you find such general properties, then the statistical properties are used as a description of your results.
The other role of statistics is to help you make decisions. Specifically, you can use statistical tests when you want to decide whether there are differences between sets of observations. This is a very precise decision- making system, with rules that insure that anyone using the same data will come to the same conclusions.
These are distinct analysis goals, although in practice there is some overlap in their application. For example, decisions on differences generally are based on statistical descriptions of the data.