Calculating Load

The basic problem we’d like to solve is to be able to tell customers how many machines they need to set up. We can simplify this to calculating how many users can go on a single machine and then dividing to get hardware requirements.

There are two limiting factors in calculating this “users per box” number: response time and throughput. We’ll need to know both before we can make an predictions about how much hardware our predicted load will require.

Response time refers to how long a particular user will have to wait for a request to come back. This is obviously somewhat random, as some requests will take longer than others due to database contention, network congestion, thread scheduling, and a billion other factors. Since it’s random, it will not in general be possible to guarantee that 100% of requests will be below a certain time – one in a million of them will take a really long time. Instead, response time requirements should be described probabilistically, as in, “Ninety-five percent of requests should complete in less than four seconds.”

If we assume that the distribution of response times for a given load on the system is roughly Gaussian (i.e. a bell curve) then it is useful to describe this in terms of standard deviations from the mean, rather than in terms of percent. You may have heard the term sigma used in this context; sigma (σ) is the Greek character used to represent one standard deviation.

Two sigmas represent roughly 95% of the distribution. In other words, we can expect that 95% of all values should be within twice the standard deviation from the average. For a distance of 3σ, the value increases to 99%, making both these numbers useful metrics of response time characteristics.

The other metric we care about when analyzing a system is throughput. This is simply the number of requests the system can handle per unit time, across all users. Throughput is a function of load – as more and more concurrent demands are made on a system, it will initially be able to keep up, but will eventually bog down switching back and forth between tasks. A typical throughput curve (smoothed to emphasize the effect) is shown in Figure 1.

Figure 1 - Typical throughput curve

So far, so good: deriving these numbers is easy enough. In fact, most load testing frameworks will do exactly this. A simplified excerpt of a typical report is shown below in Figure 2.

Note that I haven’t labeled the horizontal axis. This particular data was generated using a custom load-testing framework that spins up several threads, with each thread doing as much work as it can before the test ends. These threads do not represent users – each thread likely represents work equivalent to many, many users of the real system. Do not make the mistake of confusing the number of threads that corresponds with peak throughput to the number of users that the system can support. The important thing is just to find how the system behaves as you increase load.

Figure 2 - Throughput and Response Time

So how does one translate these numbers to an actual number of users per machine? Fortunately, to a first order of approximation, it’s really quite easy. A few simple definitions will help.

The capacity of the system is how much load it can handle, expressed in requests per second. This we can simply look up – it’s the throughput listed in Figure 2, for example 1.57 requests per second.

The service level is the maximum acceptable response time at some particular level of confidence. For example, we might say, “Requests must be serviced within 10 seconds in 99% of cases.” We might use sigmas instead of percentage – as discussed previously, they’re equivalent. This number will be supplied by the customer.

The demand on the system is how much load we expect the users to incur, expressed in requests per second. This number we get by multiplying the number of users times their individual rate of requests. For example, if there are 100 users each making a request every 10 seconds, the demand on the system across all users is 10 requests per second.

Determining how many machines are needed is simply a matter of ensuring that capacity can meet demand at a given service level. Here’s the process:

    1. Gather service level requirements.

    2. Use Figure 2 (or equivalent for the system in question) to find the highest throughput that still meets the requirements from step 1.

    3. Gather demand requirements.

    4. The throughput from step 2 is the capacity for an individual machine. Divide the demand from step 3 by this number and round up to get the number of machines needed.

    5. Add some fudge factors – this isn’t an exact science.

An example might help:

    1. Requests must be processed within 8 seconds 95% of the time.

    2. Based on Figure 2, the throughput a single box can handle while still maintaining an 8 second response time at 95% confidence (two sigma) is about 1.64 requests per second. I’ve highlighted the appropriate line.

    3. The demand we expect is based on the fact that the customer has 100 users that they expect to make a request (on average) every two minutes, for a total rate of 0.83 requests per second.

    4. Since the demand is less than the capacity for a single machine, we only need one machine!

    5. There’s enough of a difference between demand and capacity that we’re probably okay, but adding a second machine might be necessary if the users were making requests every one minute, as demand would now be 1.67 requests per second.

The process for calculating how many users can fit on a single machine is similar.

    1. Gather service level requirements.

    2. Use Figure 2 (or equivalent for the system in question) to find the highest throughput that still meets the requirements from step 1.

    3. Gather the average request rate for a single user.

    4. The throughput from step 2 is the capacity for an individual machine in requests per second. Divide the throughput from step 2 by the per-user request rate from step 3. This is the number of users a single box can support.

    5. Add some fudge factors – this isn’t an exact science.

Again, an example might be illustrative.

    1. Requests must be processed within 8 seconds 95% of the time.

    2. Based on Figure 2, the throughput a single box can handle while still maintaining an 8 second response time at 95% confidence (two sigma) is about 1.64 requests per second. I’ve highlighted the appropriate line.

    3. The customer expects users to average one request per five minutes, for a rate of 0.0033 requests per second.

    4. Given these numbers, each box can support 492 users.

    5. Tell the customer some number lower than 492.

A few observations:

    • If the required response time is lower than some minimum, no amount of machines will meet the requirements.

    • We’ve assumed that doubling the number of machines doubles throughput. Because database contention is a major factor in scalability, this will almost certainly not be true.

    • We’ve assumed that adding a second machine has no impact on response time. Because database contention is a major factor in response time, this will almost certainly not be true.

There are other factors as well that mean that reality and this model will diverge. However, having at least some theoretical basis for estimation may be helpful. If nothing else, it puts the onus on the client to provide numbers for demand and service level, which makes them an active participant in the capacity planning process.