*Discussion*on the Linkedin CPPE group, which is accessible to Members Only. It's written in the style of a journal reviewer. The original Discussion topic was based on a link to a blog-post. I've been asked to make my Linkedin review more widely available so, here tiz...

The blog-post Capacity Planning on a Cocktail Napkin is a really good example of a really bad explanation. There are so many things that are misleading, at best, and flat-out wrong, at worst, it's hard to know where to begin (or where to stop). Nevertheless, I'll try to keep it brief [Cocktail-napkin discussions are only as useful as their accuracy. Otherwise, I can think of better uses for napkins.I failed in that endeavor. — njg].The author applies the equation:

\begin{equation} E = λ \end{equation}

Why? What is that equation? We don't know because the author

does notsay yet whatallthe symbols mean. It's all very well to impress people by slinging equations around, but it's more important to say what the equations actually mean. After all, the author might have chosen the wrong one.Eventually, we learn that E stands for

Erlangs; which is actually a teletraffic measure of capacity. The author possibly got it from here (or similar) which states, quite clearly, what the symbols mean, viz., "average call-holding time, h" and later, "...average holding time/service time, h". It isveryimportant to understand that 'h' in eqn. (1) is aservicetime (I'll call it 'S' hereafter) that doesnotinclude buffering or wait time (W). The connection between the S and W metrics is theresidencetime (R) where:\begin{equation} R = S + W \end{equation}

Why is this important? Because the author is using only S [or h in eqn. (1)], which is typically a shorter time than R in (2), and that means the author is possibly

underestimatingcapacity.

Little's Law

Equation (1) is actually a version of Little's law which I can rewrite slightly as:\begin{equation} ρ = λ S \end{equation}

In this definition, E corresponds to the utilization (ρ) of the server or switch. The more general case of Little's law, however, includes buffering time (W) and is therefore expressed in terms of R as:

\begin{equation} N = λ R \end{equation}

where N is the level of concurrency or requests in the entire system, not just the occupancy at the telephone switch or service process.

Little's law in eqn.(4) applied to this load average data tells us that the average number resident in the system during the 20 minute window (indicated by the width of the black box) is N = 3.845 processes. This result can be seen by adding up all the blocks belonging to the 20 brown columns (1-minute load average data) between 21:30 and 21:50. That total brown area is 3.845 and the equivalent rectangular area (Little's law) is obtained by making the black box 0.1923 units high—the average load during that period. Clearly, some columns are above the top of the black box and some are below. In this picture, the column heights correspond to the instantaneous number of processes in the system during each minute. We can also think of those heights as representing the instantaneous arrival rate during each minute. The average arrival rate is λ = 0.1923 processes per minute and R = 20 minutes. Hence, N = λ R, which yields N = 3.845 processes.

To summarize, so far: eqns. (1) or (3) tell us the average number of requests in

service, whereas eqn. (4) tells us the average number of requests inresidence. The latter includes those not yet being serviced but waiting for service. Clearly therefore: N ≥ E.

Exponentials et al.

The other really confusing part of the "Napkin" post is the way the author's explanation jumps betweenPareto, Poisson, ExponentialandNormaldsns. (Pity the poor student.)

- If the arrivals are
Pareto distributed(i.e., highly correlated), as the author assumes so as to get his 0.93 users/sec number, then thePoisson/Exponentialassumption (i.e., statistical independence)cannot apply. But the author inexplicably decides to use the Exponential dsn anyway.- It's a property of the (continuous)
Exponential dsnfor interarrivaltimesthat the mean == standard deviation == √var^{*}(cf. the discrete Poisson dsn foreventoccurrences, where the mean == variance). These propertiesdo notapply to the Normal dsn.- The author claims that the
Normal dsnapproximates theExponential dsn, to wit: "an arrivals rate of about 0.93 per second—then we can approximate with a normal distribution, where the mean is 0.93 per second, and the variance is also 0.93 seconds." That's news to me!

Referring to the above plot, the Normal or Gaussian dsn is an exponential function with an exponent that is

quadraticin the random variable (X), i.e., has the form exp(−X^{2}). The Exponential dsn, on the other hand, has an exponent that islinearin the positive random variable (X ≥ 0), i.e., has the form exp(−X). Because of the X^{2}in the exponent of the Normal dsn, it issymmetricabout it's mean (μ = 1 in the above plot). Conversely, the Exponential dsn is highlyasymmetricabout it's mean (λ = 1 in the above plot). It is therefore groundless and confusing to suggest that the Exponential dsn can be approximated by a Normal dsn without a significant loss of accuracy^{†}.Apparently unaware of such details, the author proceeds to use a Normal approximation with "Mean is λ, variance σ

^{2}= λ". However, for an Exponential dsn with rate λ (i.e., inverse time), the mean is 1/λ and the variance is (1/λ)^{2}. That's because 1/λ is the interarrivaltime; the average timebetweenthe arrival of requests.This is a very important distinction because bursts of arrivals will be statistically much greater with Exp-distributed traffic (due to the above-mentioned

asymmetry), and that could be very significant for any capacity planning assumptions^{‡ §}. To see this, thinkpercentiles.The 95th percentile is approximately 2σ. Whether you use his (incorrect) identification σ

^{2}= λ or σ^{2}= (1/λ)^{2}—it's hard to know what to use here because everything is going wrong by this stage of his exposition—you get values near 2. If instead you apply the Exponential dsn, the 95th percentile is approximately 3 times the mean or a value around 3.25 (in the above plot); a very big difference. Therefore, applying the author's (incorrect) Normal approximation will generally incur a large underestimation error.

**Related posts:** Load Testing Think Time Distributions

^{*}Erlang (the engineer) reported in his 1909 paper that his measurements of arriving telephone calls were indeed Poisson distributed. The same is true today for Help Center calls and so on.

^{†}See Perl:::PDQ Section 3.5 for more about using the Gamma dsn as an approximation to the Exponential dsn.

^{‡}I find it astounding the FOSS load-test tools like JMeter and, even worse, very expensive tools like LoadRunner do not implement the Exponential dsn as an option. Keep in mind that JM and LR are performance

**modeling**tools: they are

*workload simulators*. As I point out in my GCaP classes, the Exponential dsn is important for several reasons:

- It offers a higher degree of variation (fluctuations) from the client request-load than either Uniform or Gaussian dsns, thereby promoting the detection of unanticipated fails, like buffer overflow.
- It offers a convenient opportunity to compare with other performance tools, like PDQ http://goo.gl/B4RzY, which also makes use of Exponential arrival patterns.
- It is
**the**correct load profile if you are modeling a Help Center, teletraffic system or similar.

^{§}Bursts of Exponential arrivals should not be confused with Pareto-type correlations (as the "Napkin" author does). In my GCaP classes, I show a

**movie**of Exponential arrivals where clusters of arrivals are clearly visible. That suggests the presence of

*correlations*in what is otherwise supposed to be

*statistically independent*arrivals. This paradox is resolved by watching the movie for a longer time and noting the appearance of gaps (i.e., no arrivals) of the same or similar length to the previously observed clusters. In other words, the

**bursts**are balanced out by

**gaps**so that overall, the arrivals are indeed statistically independent.

## 8 comments:

Added a new plot to explain the connection between

Little's law(N, the average number in the system) and Unix/Linuxload-average datarepresented by the usual time series one sees coming from modern performance monitoring tools.Neil, great reminder! Independence between the lambda and the h is a very often overlooked assumption indeed.

Hi Dr Gunther, I agree with your comment on load tools that do not implement exponential distribution to generate the workload. Do you think is it possible to emulate an exponential distribution of requests by combining more runs of a tools in parallel using another distribution (e.g. uniform), different average and different quantities? I was thinking something like:

to emulate 10 requests per seconds

run 1: 10 events with delay 1 sec

run 2: 18 events with delay 2 sec

run 3: 25 events with delay 3 sec

etc.

Thanks

Regards

MatteoP

Hi Matteo, Interesting idea, but I suspect it will be thwarted by a well-known theorem in probability theory, which states roughly that adding the resulting collection metrics (random variables with whatever underlying dsn---uniform in your suggestion) will merely tend to emulate a Normal dsn. And that is arguably

lesseffective than a uniform dsn.For more details about how to generate

exponentially distributed delays, see my latest blog post.Hi Dr Gunther,

I've seen your new post, it seems should be easy to include a development on the tools of they wanted to. Meanwhile I tried to see if my original idea was feasible with a simulation. Central limit theorem should apply when the summed ramdom variables have the same average. In this case I tried to do something like this, http://tinyurl.com/79bd88z, rotated by 90 degrees clockwise. I've loaded the data (x, y) from a exp dsn (generated with excel) into a table and run a script to generate, for each x, a set of random uniform numbers (to fulfill each sub rectangle area as per the picture) with a range from 0 to y. I then summed the occurrences for each x and the graph resembles the original exponential curve.

This is the PL/SQL code I used:

---------------------

declare

type tab is table of number index by binary_integer;

l_t tab;

l_n number;

i number;

j number;

begin

for a in (select x, round((exp_dsn - nvl(lag(exp_dsn) over(order by x desc),0))*x*1000) ex from mat_exp order by x desc) loop

for i in 1..a.ex loop

select abs(mod(dbms_random.random, a.x)) into l_n from dual;

if (l_t.exists(l_n)) then

l_t(l_n):= l_t(l_n) +1;

else

l_t(l_n):= 0;

end if;

end loop;

end loop;

for j in (select x from mat_exp order by x) loop

dbms_output.put_line(to_char(j.x)||'='||l_t(j.x));

end loop;

end;

----------------

Sorry for not using R.

How much wrong is this approach? :-)

Thanks

Regards

MatteoP

errata corrige:

...with a range from 0 to y...

should be

...with a range from 0 to x...

Matteo, I'm not about to debug your code here but, if you are starting with a table of Exp values and by "rotation" you actually mean pivoting about the

diagonalif the xy-axes then, reading off using a uniform dsn is the same as coin flipping and should produce Exp-distd variates.In other words, it sounds like you are implementing the inverse transform, described in my blog post, as a lookup table.

Post a Comment