Notes

"Many probability distributions are not a single distribution, but are in fact a family of distributions. This is due to the distribution having one or more shape parameters.

Shape parameters allow a distribution to take on a variety of shapes, depending on the value of the shape parameter. These distributions are particularly useful in modeling applications since they are flexible enough to model a variety of data sets."  (http://itl.nist.gov/div898/handbook/eda/section3/eda363.htm)

A probability distribution is characterized by location and scale parameters. Location and scale parameters are typically used in modeling applications.

The location parameter simply shifts the graph left or right on the horizontal axis.

The scale parameter is to stretch out the graph. The effect of a scale parameter greater than one is to stretch the pdf (greater the magnitude, the greater the stretching). The effect of a scale parameter less than one is to compress the pdf. The compressing approaches a spike as the scale parameter goes to zero.

(A scale parameter of 1 leaves the pdf unchanged (if the scale parameter is 1 to begin with) and non-positive scale parameters are not allowed.)

For the normal distribution, the location and scale parameters correspond to the mean and standard deviation, respectively. However, this is not necessarily true for other distributions. In fact, it is not true for most distributions.

Kurtosis measures the peakedness of a distribution. The shape of a distribution can range from very peaked to very flat.

Leptokurdic describes a distribution that is very peaked. In this case, the data is lumped together. Platykurdic describes a distribution that is very flat. In this case, the data is very spread out.

Symmetric Distribution:

Light tailed distribution:

Normal distribution:  The normal distribution is applicable to a very wide range of phenomena and is the most widely used distribution in statistics. It was originally developed as an approximation to the binomial distribution when the number of trials is large and the Bernoulli probability p is not close to 0 or 1. It is also the asymptotic form of the sum of random variables under a wide range of conditions.

The normal distribution was first described by the French mathematician de Moivre in 1733. The development of the distribution is often ascribed to Gauss, who applied the theory to the movements of heavenly bodies.

Normal distributions occur in situations where (1) many factors affect the values of the variables of interest; (2) the many factors are largely independent of each other, so the effects of the factors on the variable are additive; and (3) the factors make approximately equal contributions to the variation evidenced in the variable.

Parametric statistical tests often assume the sample under test is from a population with normal distribution. By making this assumption about the data, parametric tests are more powerful than their equivalent non-parametric counterparts and can detect differences with smaller sample sizes, or detect smaller differences with the same sample size.

Many parametric tests, such as the t-test and ANOVA, use the mean of the sample so some non-normality can be tolerated (due to the Central Limit Theorem). How large a sample you need depends on how skewed the sample distribution is – the more skewed the data, the larger the sample size should be – so it’s not possible to give hard and fast rules. You should first check the degree of non-normality and, only after (careful!) consideration, decide if you can safely use the test.

Student's t distribution: The Student’s t distribution is used to test whether the difference between the means of two samples of observations is statistically significant. For example, the heights of a random sample of basketball players could be compared with the heights from a random sample of football players. The Student’s t distribution would be used to test whether the data indicated that one group was significantly taller than the other. More precisely, it would be testing the hypothesis that both samples were drawn from the same normal population. A significant value of t would cause the hypothesis to be rejected, indicating that the means were significantly different.

Heavy tailed distribution:

"The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. The Cauchy distribution has the interesting property that collecting more data does not provide a more accurate estimate of the mean. That is, the sampling distribution of the mean is equivalent to the sampling distribution of the original data. This means that for the Cauchy distribution the mean is useless as a measure of the typical value. For this histogram, the mean of 3.7 is well above the vast majority of the data. This is caused by a few very extreme values in the tail. However, the median does provide a useful measure for the typical value.

Although the Cauchy distribution is an extreme case, it does illustrate the importance of heavy tails in measuring the mean. Extreme values in the tails distort the mean. However, these extreme values do not distort the median since the median is based on ranks. In general, for data with extreme values in the tails, the median provides a better estimate of location than does the mean." (http://itl.nist.gov/div898/handbook/eda/section3/eda351.htm)

Skewed (or Asymmetric) Distribution:

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (possibly including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution. (Wikipedia article)

For skewed distributions, it is not at all obvious whether the mean, the median, or the mode is the more meaningful measure of the typical value. In this case, all three measures are useful.

Light tail Distributions:

Poisson distribution:  The Poisson distribution is applied in counting the number of rare but open-ended events. A classic example is the number of people per year who become invalids due to being kicked by horses. Another application is the number of faults in a batch of materials.

It is also used to represent the number of arrivals, say, per hour, at a service center. This number will have a Poisson distribution if the average arrival rate does not vary through time. If the interarrival times are exponentially distributed, the number of arrivals in a unit time interval are Poisson distributed. In practice, arrival rates may vary according to the time of day or year, but a Poisson model will be used for periods

that are reasonably homogeneous.

Some Trivia (Olofsson): If you have a one-in-a-million chance to succeed with something and try a million times, what is the probability that you still do not succeed? If you randomly place 64 grains of rice on a chessboard, what proportion of squares can you expect to be empty? The answer to all these questions is: 0.37.

The reason 0.37 comes up in such question is due to the law of rare events. This law says that if an event is rare, unpredictable, and occurs on average once, then the  probability that it does not occur at all is e^(-1).

Fix 1 of the 64 squares on the chessboard, for example, the square a1 (the corner square closest to the white player’s left hand). The probability to hit a1 with a grain of rice is 1/64, which is fairly small. It is totally unpredictable when a hit will come; it can occur anytime, independently of how many times you have hit a1 before. As you try 64 times, you expect to hit once, but that is about all you can say for certain. Of course, you may still fail to hit and the law of rare events tells you that the probability of this is e^-1

What if the events occur on average twice, or three times, or some other number of times? And what if we instead want to ask for the probability that it happens once, or twice, or some other number of times? Actually, the law of rare events is more general than I said above, and it does cover these cases as well. Thus, suppose that we are dealing with some rare, unpredictable event that occurs on average \lambda times. The number of

occurrences is said to follow the Poisson distribution, and the probability of

k occurrences for k = 0. 1.2. ... is given by the formula

e^(-\lambda)* (\lambda^k)/ k!

Examples that are commonly given are the number of misprints on a page in a book or newspaper, the number of mutations in some section of a DNA molecule, counts of radioactive decay. the number of stars in some large volume of space, and the number of hits of some specified webpage. Wait a minute now, hits of a webpage? Yahoo, Google, Amazon? Not exactly rare are they? Well, it depends on the time scale. You can always find a time scale that makes it rare, milliseconds or microseconds, or whatever you want. As hits are unpredictable, coming from a large number of users acting independently of each other and spread all over the world, the Poisson distribution still fits well.

The Poisson process is a simple kind of random process, which models the occurrence of random points in time or space. There are numerous ways in which processes of random points arise: some examples are presented in the first section. The Poisson process describes in a certain sense the most random way to distribute points in time or space. This is made more precise with the notions of homogeneity and independence.

Exponential distribution: This is a distribution of the time to an event when the probability of the event occurring in the next small time interval does not vary through time. It is also the distribution of the time between events when the number of events in any time interval has a Poisson distribution.

The exponential distribution has many applications. Examples include the time to decay of a radioactive atom and the time to failure of components with constant failure rates. It is used in the theory of waiting lines or queues, which are found in many situations: from the gates at the entrance to toll roads through the time taken for an answer to a telephone enquiry, to the time taken for an ambulance to arrive at the scene of an accident. For exponentially distributed times, there will be many short  times, fewer longer times, and occasional very long times.

The exponential distribution is a skewed, i. e., not symmetric, distribution. For skewed distributions, the mean and median are not the same. The mean will be pulled in the direction of the skewness. That is, if the right tail is heavier than the left tail, the mean will be greater than the median. Likewise, if the left tail is heavier than the right tail, the mean will be less than the median.

Weibull distribution: TheWeibull variate is commonly used as a lifetime distribution in reliability applications. The two-parameter Weibull distribution can represent decreasing, constant, or increasing failure rates.

Erlang distribution: Events that occur independently with some average rate are modeled with a Poisson process. The waiting times between k occurrences of the event are Erlang distributed. (The related question of the number of events in a given amount of time is described by the Poisson distribution.) The Erlang distribution is the distribution of the sum of k independent identically distributed exponential distribution. The long-run rate at which events occur is the reciprocal of the expectation of X, that is λ / k. random variables each having an

Heavy tail Distribution:

Log-normal distribution: assuming that files are created by modifying previous files, and that this can be modeled as multiplying the file size by a certain factor, leads to a log-normal file-size distribution.

The lognormal distribution is applicable to random variables that are constrained by zero but have a few very large values. The resulting distribution is asymmetrical and positively skewed. Examples include the following:

The application of a logarithmic transformation to the data can allow the data to be approximated by the symmetrical normal distribution, although the absence of negative values may limit the validity of this procedure.

Pareto distribution: The Pareto distribution is often described as the basis of the 80/20 rule. For example, 80% of customer complaints regarding a make of vehicle typically arise from 20% of components. Other applications include the distribution of income and the classification of stock in a warehouse on the basis of frequency of movement.

Multimodal probability distribution:

A multimodal distribution is a probability distribution having more than one mode. A bimodal distribution is a continuous probability distribution with two different modes. These appear as distinct peaks (local maxima) in the probability density function,

A bimodal distribution most commonly arises as a mixture of two different unimodal distributions (i.e. distributions having only one mode).