*statistical mean*or average. Mean

**input**values include such queueing metrics as service times and arrival rates. These could be sample means. Mean

**output**values include such queueing metrics as waiting time and queue length. These are computed means based on a known distribution. I'll say more about exactly what distribution, shortly. Sometimes you might also want to report measures of

**dispersion**about those mean values, e.g., the 90th or 95th percentiles.

### Percentile Rules of Thumb

In The Practical Performance Analyst (1998, 2000) and Analyzing Computer System Performance with Perl::PDQ (2011), I offer the following Guerrilla rules of thumb for percentiles, based on a mean residence time R:- 80th percentile: p80 ≃ 5R/3
- 90th percentile: p90 ≃ 7R/3
- 95th percentile: p95 ≃ 9R/3

I could also add the 50th percentile or median: p50 ≃ 2R/3, which I hadn't thought of until I was putting this blog post together.

### Example: Cellphone TTFF

As an example of how the above rules of thumb might be applied, an article in GPS World discusses how to calculate the*time-to-first-fix*or TTFF for cellphones.

In other words:It can be shown that the distribution of the acquisition time of a satellite, at a given starting time, can be approximated by anexponential distribution. This distribution explains the non-linearity of the relationship between the TTFF and the probability of fix. In our example, the50-percentprobability of fix was about 1.2 seconds. Moving the requirement to90 percentmade it about 2 seconds, and95 percentabout 2.5 seconds.

- 50th percentile: p50 = 1.2 seconds
- 90th percentile: p90 = 2.0 seconds
- 95th percentile: p95 = 2.5 seconds

I can assess these values *Guerrilla-style* by applying the above rules of thumb using the *R* language:

pTTFF <- function(R) { return(c(2*R/3, 5*R/3, 7*R/3, 9*R/3)) } # Set R = 1 to check rules of thumb: > pTTFF(1) [1] 0.6666667 1.6666667 2.3333333 3.0000000 # Now choose R = 0.83333 (maybe from 1/1.2 ???) for cellphone case: > pTTFF(0.8333) [1] 0.5555333 1.3888333 1.9443667 2.4999000

Something is out of whack! The p90 and p95 values agree, well enough, but p50 does not. It could be a misprint in the article, my choice for the R parameter might be wrong, etc. Whatever the source of the discrepancy, it has to be **explained** and ultimately resolved. That's why being able to go Guerrilla is **important**. Even having wrong expectations is better than having no expectations.

### Quantiles in R

The Guerrilla rules of thumb follow from the assumption that the underlying statistics are exponentially distributed. The exponential PDF and corresponding exponential CDF are shown in Fig. 1, where the mean value, R = 1 (red line), is chosen for convenience.

Figure 1. PDF and CDF of the exponential distribution

The CDF gives the probabilities and therefore is bounded between 0 and 1 on the y-axis. The corresponding percentiles can be read off directly from the appropriate horizontal dashed line and its corresponding vertical arrow. The exact values can be determined using the `qexp` function in the *R* language.

> qexp(c(0.50, 0.80, 0.90, 0.95)) [1] 0.6931472 1.6094379 2.3025851 2.9957323

which can be compared with the locations on the x-axis in Fig. 1 where the arrowheads are pointing.

### Example: PDQ with Exact Percentiles

The rules of thumb and the exponential assumption are certainly valid for M/M/1 queues in any PDQ model. However, rather than clutter up the standard PDQ Report with all these percentiles, it is preferable to select the PDQ output metrics of interest and add their corresponding percentiles in a custom format. For example:library(pdq) arrivalRate <- 8.8 serviceTime <- 1/10 Init("M/M/1 queue") # initialize PDQ CreateOpen("Calls", arrivalRate) # open network CreateNode("Switch", CEN, FCFS) # single server in FIFO order SetDemand("Switch", "Calls", serviceTime) Solve(CANON) # Solve the model #Report() pdqR <- GetResidenceTime("Switch", "Calls", TRANS) cat(sprintf("Mean R: %2.4f seconds\n", pdqR)) cat(sprintf("p50 R: %2.4f seconds\n", qexp(p=0.50,rate=1/pdqR))) cat(sprintf("p80 R: %2.4f seconds\n", qexp(p=0.80,rate=1/pdqR))) cat(sprintf("p90 R: %2.4f seconds\n", qexp(p=0.90,rate=1/pdqR))) cat(sprintf("p95 R: %2.4f seconds\n", qexp(p=0.95,rate=1/pdqR)))which computes the following PDQ outputs:

Mean R: 0.8333 seconds p50 R: 0.5776 seconds p80 R: 1.3412 seconds p90 R: 1.9188 seconds p95 R: 2.4964 seconds

A similar approach can be extended to multi-server queues in the PDQ `CreateMultiNode` function, but `qexp` has to be replaced by:
\begin{equation}
p_{m}(q) = \dfrac{S}{m(1-\rho)} \log \bigg[ \dfrac{C(m,m\rho)}{1-q} \bigg] \label{eqn:waitpc}
\end{equation}
for the waiting time percentile, where $C$ is the Erlang C-function, $\rho$ is the per-server utilization and $q$ is the desired decimal quantile. If enough interest is expressed, I can add such a function to a future release of PDQ. I'll say more in the upcoming Guerrilla data analysis class.

**Update:** See the newer post, Response Time Percentiles for Multi-server Applications, for more detailed developments.

## 2 comments:

Neil, I’m new to your blog, having come here from a search for PM and CM reporting for telecom. I have some questions that are not on topic to this post; is there another route I could take, or should I just ask them here? The questions have to deal with using a “busy hour” approach to network reporting; I’m of the opinion that BH is the correct approach but have others in my company that are pushing something else.

BTW: I’m real interested in your insights about using harmonic mean for aggregating “rates”. It was completely new to me, and after some thought it makes a lot of sense.

Thanks…

Best option for general performance and capacity questions is to join the GCaP google group. You don't need to be an alumnus.

Glad you're getting something out of the

harmonic mean post. Not done yet. One more round to go. Watch that space.

Post a Comment