To make this point a bit clearer, I used the very convenient variate-generation functions in R to compare some of the distributions that I consider should be included in such a library for the convenience of workload-test designers and performance engineers. The statistical mean (i.e., the average think delay) is the same in all these plots and is shown as the red vertical line, but pay particular attention to the spread around the mean on the x-axis.
Uniform: The first plot (upper left) shows the default uniform distribution with a mean Z = 10 seconds and a range between 5 and 15 seconds. This is what a standard random number generator produces. Each call in the script will produce an explicit think delay somewhere around 10 seconds. The typical frequency of occurrence for each variate is shown in the y-axis. I'm using seconds here as the nominal time base for think delay.
Exponential: One of the most common alternative delay distributions is the exponential distribution. There are two reasons you might want to use this distribution:
- It increases the likelihood of queueing and therefore detecting buffer overflows in the SUT
- It makes test results easily comparable to a PDQ model, which always assumes an exponential Z distribution
Gamma: Often there is considerable gnashing of teeth over the exponential distribution not being realistic. Quite apart from the usual academic technicalities, it's a better choice than uniform. That said, a suitable generalization, which introduces more correlations into the arrivals, is the gamma distribution. Whereas the exponential distribution is defined in terms of a single arrival-rate parameter (λ), the gamma distribution is defined by two parameters: the shape (α) and scale (β). Setting α = 1 and β = λ, produces the exponential distribution.
Pareto: Finally, the Pareto distribution is suitable for simulating highly correlated arrivals, such as has been discussed ad nauseum in the context of Internet packets. The Pareto distribution emulates heavy-tailed or self-similar traffic. Since Perato is a hyperbolic-class function, it corresponds to infinite variance effects or almost constant variance over many decades, in practice.
I'll have more to say about all this in our upcoming Guerrilla Data Analysis Techniques class in August.
I've used lognormal for think times. It has the shape of exponential but has far fewer values close to 0. It is definitely used for equipment repair times in logistics modeling. Gamma might have parameter values that results in this too. Of course, tools don't support this distribution either. In our work with LoadRunner, we build data tables with the desired distribution values and let the uniform distribution provide the index. It's extra work. 8-(
ReplyDeleteIs it a good idea to generate values using R and use them as delay in a Java client program ?
ReplyDeleteIs this is a effective alternative to coding random number generation based on statistical formula directly in Java ?