Thursday, May 20, 2010

Load Testing Think Time Distributions

One of my gripes about some commercial load testing tools is that they only provide a think time distribution (Z) that is equivalent to uniform variates in the client-script. If you want some other distribution, you have to code it and debug it yourself. Load test generators are essentially very expensive workload simulators; especially when you take into account the cost of the SUT platform. At those prices, a selection of distributions should be provided as a standard library—like they are in event-based simulators.

To make this point a bit clearer, I used the very convenient variate-generation functions in R to compare some of the distributions that I consider should be included in such a library for the convenience of workload-test designers and performance engineers. The statistical mean (i.e., the average think delay) is the same in all these plots and is shown as the red vertical line, but pay particular attention to the spread around the mean on the x-axis.

Uniform: The first plot (upper left) shows the default uniform distribution with a mean Z = 10 seconds and a range between 5 and 15 seconds. This is what a standard random number generator produces. Each call in the script will produce an explicit think delay somewhere around 10 seconds. The typical frequency of occurrence for each variate is shown in the y-axis. I'm using seconds here as the nominal time base for think delay.

Exponential: One of the most common alternative delay distributions is the exponential distribution. There are two reasons you might want to use this distribution:
  • It increases the likelihood of queueing and therefore detecting buffer overflows in the SUT
  • It makes test results easily comparable to a PDQ model, which always assumes an exponential Z distribution
The exponential distribution is associated with a Poisson process. A Poisson distribution belongs to the kind of randomness you hear in the clicks of a Geiger counter (e.g., in the movies). The time between those clicks is exponentially distributed. The asymmetry of the distribution about the mean, makes it more useful than a uniform distribution for load testing.

Gamma: Often there is considerable gnashing of teeth over the exponential distribution not being realistic. Quite apart from the usual academic technicalities, it's a better choice than uniform. That said, a suitable generalization, which introduces more correlations into the arrivals, is the gamma distribution. Whereas the exponential distribution is defined in terms of a single arrival-rate parameter (λ), the gamma distribution is defined by two parameters: the shape (α) and scale (β). Setting α = 1 and β = λ, produces the exponential distribution.

Pareto: Finally, the Pareto distribution is suitable for simulating highly correlated arrivals, such as has been discussed ad nauseum in the context of Internet packets. The Pareto distribution emulates heavy-tailed or self-similar traffic. Since Perato is a hyperbolic-class function, it corresponds to infinite variance effects or almost constant variance over many decades, in practice.

I'll have more to say about all this in our upcoming Guerrilla Data Analysis Techniques class in August.


Tom said...

I've used lognormal for think times. It has the shape of exponential but has far fewer values close to 0. It is definitely used for equipment repair times in logistics modeling. Gamma might have parameter values that results in this too. Of course, tools don't support this distribution either. In our work with LoadRunner, we build data tables with the desired distribution values and let the uniform distribution provide the index. It's extra work. 8-(

Mohan Radhakrishnan said...

Is it a good idea to generate values using R and use them as delay in a Java client program ?

Is this is a effective alternative to coding random number generation based on statistical formula directly in Java ?