Notes
The performance of any type of system cannot be determined without knowing the workload, that is, the requests being processed. Workload characterization consists of a description of the workload by means of quantitative parameters and functions; the objective is to derive a model able to show, capture, and reproduce the behavior of the workload and its most important features.
Workload models tended to be mathematical constructs that depended heavily on the modelers' intuitive understanding of networking reality, they typically focused entirely on analytic tractability, and empirical validations against measured data were considered superfluous and unnecessary (contrast this to arm chair speculation based approach to astronomy before Galileo's time).
Consequences
What If Assumptions Do Not Hold?
Primary Goal is Correct and Valid Scientific Conclusions
Consequences of Invalid Assumptions
If some of the underlying assumptions do not hold, what can be done about it? What corrective actions can be taken? The positive way of approaching this is to view the testing of underlying assumptions as a framework for learning about the process. Assumption-testing promotes insight into important aspects of the process that may not have surfaced otherwise.
The primary goal is to have correct, validated, and complete scientific/engineering conclusions flowing from the analysis. This usually includes intermediate goals such as the derivation of a good-fitting model and the computation of realistic parameter estimates. It should always include the ultimate goal of an understanding and a "feel" for "what makes the process tick". There is no more powerful catalyst for discovery than the bringing together of an experienced/expert scientist/engineer and a data set ripe with intriguing "anomalies" and characteristics.
The following sections discuss in more detail the consequences of invalid assumptions:
Robustness
There are various alternatives to the mean and median for measuring location. These alternatives were developed to address non-normal data since the mean is an optimal estimator if in fact your data are normal.
Tukey and Mosteller defined two types of robustness where robustness is a lack of susceptibility to the effects of nonnormality.
Robustness of validity means that the confidence intervals for the population location have a 95% chance of covering the population location regardless of what the underlying distribution is.
Robustness of efficiency refers to high effectiveness in the face of non-normal tails. That is, confidence intervals for the population location tend to be almost as narrow as the best that could be done if we knew the true shape of the distributuion.
The mean is an example of an estimator that is the best we can do if the underlying distribution is normal. However, it lacks robustness of validity. That is, confidence intervals based on the mean tend not to be precise if the underlying distribution is in fact not normal.
The median is an example of a an estimator that tends to have robustness of validity but not robustness of efficiency.
The alternative measures of location try to balance these two concepts of robustness. That is, the confidence intervals for the case when the data are normal should be almost as narrow as the confidence intervals based on the mean. However, they should maintain their validity even if the underlying data are not normal. In particular, these alternatives address the problem of heavy-tailed distributions.
Model a univariate data set with a probability distribution
Various Methods
One common application of probability distributions is modeling univariate data with a specific probability distribution. This involves the following two steps:
Determination of the "best-fitting" distribution.
Estimation of the parameters (shape, location, and scale parameters) for that distribution.
There are various methods, both numerical and graphical, for estimating the parameters of a probability distribution.