STAT-C 316 (NONPARAMETRIC STATISTICS)
Chapter 01 - INTRO TO NONPARAMETRIC STATISTICS
INTRODUCTION:
The typical introductory courses in hypothesis-testing and confidence interval examine primarily parametric statistical procedures. A main feature of these statistical procedures is the assumption that we are working with random samples from normal populations. These procedures are known as parametric methods because they are based on a particular parametric family of distributions – in this case, the normal.
For example, given a set of independent observations from a normal distribution, we often want to infer something about the unknown parameters. Here the t-test is usually used to determine whether or not the hypothesized value μ₀ for the population mean should be rejected or not. More usefully, we may construct a confidence interval for the ‘true’ population mean.
Parametric inference is sometimes inappropriate or even impossible. To assume that samples come from any specified family of distributions may be unreasonable. For example, we may not have examination marks for each candidate but know only the numbers of candidates who obtained the ordered grades A, B+, B, B–, C+, C, D and F. Given these grade distributions for two different courses, we may want to know if they indicate a difference in performance between the two courses. In this case it is inappropriate to use the traditional (parametric) method of analysis.
In this book we describe procedures called nonparametric and distribution-free methods. Nonparametric methods provide an alternative series of statistical methods that require no or very limited assumptions to be made about the data. These methods are most often used to analyze data which do not meet the distributional requirements of parametric methods. In particular, skewed data are frequently analyzed by non-parametric methods, although data transformation can sometimes make the data suitable for parametric analyses. These procedures have considerable appeal. One of their advantages is that the data need not be quantitative but can be categorical (such as yes or no) or rank data.
Generally, if both parametric and nonparametric methods are applicable to a particular problem, we should use the more efficient parametric method. N(μ,σ²), then we say that the population is N(μ,σ²).
1.2 Parametric and nonparametric methods
The word statistics has several meanings. It is used to describe a collection of data and also to designate operations that may be performed with primary data. The scientific discipline called statistical inference uses observed data – in this context called a sample – to make inference about a larger observable collection of data called a population. We associate distributions with populations
For example, if the random variable which describes a population is N(μ,σ²), then we say that the population is N(μ,σ²).
Parametric methods are often those for which we know that the population is normal, or we can approximate using a normal distribution after we invoke the central limit theorem. Ultimately the classification of a method as parametric depends upon the assumptions that are made about a population. A few parametric methods include the testing of a statistical hypothesis about a population mean under two different conditions:
1. when sampling is from a normally distributed population with known variance,
2. when sampling is from a normally distributed population with unknown variance. The nonparametric methods, however, are not based on the underlying assumptions and thus do not require a population’s distribution to be denoted by specific parameters.
1.3 Parametric versus nonparametric methods
Nonparametric methods require minimal assumptions about the form of the distribution of the population. For instance, it might be assumed that the data are from a population that has continuous distribution, but no other assumptions are made. Or it might be assumed that the population distribution depends on location and scale parameters, but the functional form of the distribution, whether normal or whatever, is not specified.
By contrast, parametric methods require that the form of the population distribution be completely specified except for finite number of parameters. For instance, the familiar one-sample t-test for means assumes that observations are selected from a population that has a normal distribution, and the only values not known are the population mean and standard deviation. The simplicity of nonparametric methods, the widespread availability of such methods in statistical packages, and the desirable statistical properties of such methods make them attractive additions to the data analyst’s tool kit.
1.4 Classes of nonparametric methods
Nonparametric methods may be classified according to their function, such as two-sample tests, tests for trends, and so on. This is generally how this book is organized. However, methods may also be classified according to the statistical ideas upon which they are based. Here, we consider the ideas that underlie the methods discussed in this book.
The typical introductory course in statistics examines primary parametric statistical procedures. Recall that these procedures include tests based on the Student’s t-distribution, analysis of variance, correlation analysis and regression analysis. A characteristic of these procedures is the fact that the appropriateness of their use for the purpose of inference depends on certain assumptions. Inferential procedures in analysis of variance, for example, assume that samples have been drawn from normally distributed populations with equal variances.
Since populations do not always meet the assumptions underlying parametric tests, we frequently need inferential procedures whose validity do not depend on rigid assumptions. Nonparametric statistical procedures fill this need in many instances, since they are valid under very general assumptions. As we shall discuss more fully later, nonparametric procedures also satisfy other needs of the researcher. By convention, two types of statistical procedures are treated as nonparametric:
(1) truly nonparametric procedures and
2) distribution-free procedures. Strictly speaking, nonparametric procedures are not concerned with population parameters.
For example, in this book we shall discuss tests for randomness where we are concerned with some characteristic other than the value of a population parameter. The validity of distribution-free procedures does not depend on the functional form of the population from which the sample has been drawn. It is customary to refer to both types of procedure as nonparametric. Kendal and Sundrum (1953) discussed the differences between the terms nonparametric and distribution-free.
1.5 When to use nonparametric procedures
The following are some situations in which the use of a nonparametric procedure is appropriate.
1. The hypothesis to be tested does not involve a population parameter.
2. The data have been measured on a scale weaker than that required for the parametric procedure that would otherwise be employed. For example, the data may consist of count data or rank data, thereby precluding the use of some otherwise appropriate parametric procedure.
3. The assumptions necessary for the valid use of a parametric procedure are not met. In many instances, the design of a research project may suggest a certain parametric procedure. Examination of the data, however, may reveal that one or more assumptions underlying the test are grossly violated. In that case, a nonparametric procedure is frequently the only alternative.
4. Results are needed in a hurry and calculations must be done by hand.
1.6 Advantages of nonparametric statistics
The following are some of the advantages of the available nonparametric statistical procedures.
1. Make fewer assumptions.
Nonparametric Statistical Procedures are procedures that generally do not need rigid parametric assumptions with regards to the populations from which the data are taken.
2. Wider scope.
Since there are fewer assumptions that are made about the sample being studied, nonparametric statistics are usually wider in scope as compared to parametric statistics that actually assume a distribution.
3. Need not involve population parameters.
Parametric tests involve specific probability distributions (e.g., the normal distribution) and the tests involve estimation of the key parameters of that distribution (e.g., the mean or difference in means) from the sample data. However, nonparametric tests need not involve population parameters.
4. The chance of their being improperly used is small.
Since most nonparametric procedures depend on a minimum set of assumptions, the chance of their being improperly used is small.
5. Applicable even when data is measured on a weak measurement scale.
For interval or ratio data, you may use a parametric test depending on the shape of the distribution. Non-parametric test can be performed even when you are working with data that is nominal or ordinal.
6. Easy to understand.
Researchers with minimum preparation in Mathematics and Statistics usually find nonparametric procedures easy to understand.
7. Computations can quickly and easily be performed.
Nonparametric tests usually can be performed quickly and easily without automated instruments (calculators and computers). They are designed for small numbers of data, including counts, classifications and ratings.
1.7 Disadvantages of nonparametric tests
Nonparametric procedures are not without disadvantages. The following are some of the more important disadvantages
May Waste Information.
The researcher may waste information when parametric procedures are more appropriate to use. If the assumptions of the parametric methods can be met, it is generally more efficient to use them.
Difficult to compute by hand for large samples.
For large sample sizes, data manipulations tend to become more laborious, unless computer software is available.
Tables not widely available.
Often special tables of critical values are needed for the test statistic, and these values cannot always be generated by computer software. On the other hand, the critical values for the parametric tests are readily available and generally easy to incorporate in computer programs
References
Armitage, P. (1971). Statistical Methods in Medical Research, Oxford and Edinburgh: Blackwell Scientific Publications. Colton, T. (1974). Statistics in Medicine, Boston: Little Brown.
Dunn, Olive J., (1964). Basic Statistics: A Primer for the Biomedical Sciences, New York: Wiley.
Kendall, M. G. and Sundrum (1953). Distribution-Free Methods and Order Properties. Rev. Int. Statist. Inst. 21, 124 – 134.
Savage, I. R. (1962). Bibliography on Nonparametric Statistics. Harvard University Press.
Remington, R. D. and Schork, M. A. (1970). Statistics with Applications to the Biological and Health Sciences, Englewood Cliffs, N.J.: Prentice-Hall.