15: Workload Modeling

"The only man I know who behaves sensibly is my tailor; he takes my measurements anew each time he sees me. The rest go on with their old measurements and expect me to fit them." - George Bernard Shaw."The essence of modeling, as opposed to just observing and recording, is one of abstraction. This means two things: generalization and simplification." - Dror Feitelson.

"Always assume that your assumption is invalid." — Robert F. Tatman.

Lecture outline: How can we develop repeatable workload model representative of real-user environment?

Preamble: Importance of workloads in NPE

1. Workload and system modeling

Data and models

Dynamic workload attributes (e.g., interarrival times)

Workload models: i) traces; ii) empirical distribution; iii) theoretical distribution.

Artificial workload generation models.

Benefits of synthetic workload models.

2. Distributing fitting

Graphical estimate of input distribution

Empirical distributions vs. theoretical distributions

Parameter estimation

3. Goodness of fit test

Pearson Chi-Square test

Chi-Square distribution

Example distribution fitting: soccer goals in FIFA world cups 199, 1994, 1998, 2002. (Intergoal time is exponentially distributed; Number of goals per match is Poisson distributed.)

Primary reference for this lecture:

1. “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” by Raj Jain; Chapter 6 on “Workload Characterization Techniques”.

2. "Workload Modeling for Computer Systems Performance Evaluation", Dror G. Feitelson [downloadable online].

Secondary references for this lecture:

“Generating representative web workloads for network and server performance evaluation” by Crovella et al. [link].