**response time**(RT) percentiles (or more accurately, residence time in queueing theory parlance), viz., 80th percentile: $R_{80}$, 90th percentile: $R_{90}$ and 95th percentile: $R_{95}$ to a cellphone application and found that the performance measurements were not completely consistent. Since the relevant data only appeared in a journal blog, I didn't have enough information to resolve the discrepancy; which is ok. The first job of the performance analyst is to flag performance anomalies but most probably let others resolve them—after all, I didn't build the system or collect the measurements.

More importantly, that analysis was for a **single server** application (viz., time-to-first-fix latency). At the end of my post, I hinted at adding percentiles to PDQ for **multi-server applications**. Here, I present the corresponding rules-of-thumb for the more ubiquitous multi-server or multi-core case.

### Single-server Percentiles

First, let's summarize the Guerrilla rules-of-thumb for single-server percentiles (M/M/1 in queueing parlance): \begin{align} R_{1,80} &\simeq \dfrac{5}{3} \, R_{1} \label{eqn:mm1r80}\\ R_{1,90} &\simeq \dfrac{7}{3} \, R_{1}\\ R_{1,95} &\simeq \dfrac{9}{3} \, R_{1} \label{eqn:mm1r95} \end{align} where $R_{1}$ is the statistical mean of the measured or calculated RT and $\simeq$ denotes*approximately equal*. A useful mnemonic device is to notice the numerical pattern for the fractions. All denominators are 3 and the numerators are successive

**odd**numbers starting with 5.

### Multi-server Percentiles

A multi-server analysis (M/M/m in queueing parlance) can be applied to such things as database servers and application servers, like Weblogic or Websphere. The corresponding percentile approximations are a bit more complicated in that they involve**two terms**and require knowing not just $R_{m}$, the mean RT, but also the

**standard deviation**, $R_{sd}$, about the mean. \begin{align} R_{m,80} &\simeq R_{m} + \dfrac{2}{3} R_{sd} \label{eqn:mm8r80}\\ R_{m,90} &\simeq R_{m} + \dfrac{4}{3} R_{sd}\\ R_{m,95} &\simeq R_{m} + \dfrac{6}{3} R_{sd} \label{eqn:mm8r95} \end{align} The fractions now appear in the standard deviation term, rather than the mean. Their numerical pattern is also slightly different from the single-server case in that the numerators are successive

**even**numbers starting with 2. This difference arises from the fact that equations $\eqref{eqn:mm1r80}$—$\eqref{eqn:mm1r95}$ can be rewritten as: \begin{align} R_{1,80} &\simeq \dfrac{3}{3} \, R_{1} + \dfrac{2}{3} R_{sd} \nonumber \\ R_{1,90} &\simeq \dfrac{3}{3} \, R_{1} + \dfrac{4}{3} R_{sd} \nonumber \\ R_{1,95} &\simeq \dfrac{3}{3} \, R_{1} + \dfrac{6}{3} R_{sd} \nonumber \end{align}

In the case of an M/M/1 queue, however, the standard deviation of the RT is identical to the mean of the RT (i.e., $R_{sd} = R_{1}$), so these equations reduce to $\eqref{eqn:mm1r80}$—$\eqref{eqn:mm1r95}$. For subtle technical reasons, this simplification is not possible for an M/M/m queue.

### Example: 8-Way Server

To confirm the validity of equations $\eqref{eqn:mm8r80}$—$\eqref{eqn:mm8r95}$, I built a simulation of an 8-way server, i.e., M/M/8, that assumes exponential arrival and service periods. The throughput was set to 40 requests per second with a service rate of 6 requests per second. That makes each server 83.33% busy. The simulation was run for a million time-steps with the RT logged for each request. This produced more than 650,000 raw data samples which were then read into the R environment using the following code:

```
# Fast read of near 1-million log file
require(data.table)
system.time(dt <- fread("/Users/.../logfile.data"))
df <- as.data.frame(dt)
```

The initial data frame has the following format:

```
> head(df,10)
JOBSTATE TIMESTAMP JOB_ID
1 Arrived 0.09270234 0
2 Arrived 0.10338975 1
3 Arrived 0.16829303 2
4 Arrived 0.17764726 3
5 Arrived 0.18633516 4
6 Departed 0.19268013 1
7 Arrived 0.20406345 5
8 Departed 0.24620865 3
9 Departed 0.24739371 2
10 Arrived 0.27077745 6
```

From these raw data, we need to determine the time spent in the multi-server by taking the difference between the arrival time and the departure time. In other words, we need to find each pair of times belonging to the same request id. The simplest logic in R goes something like this:

```
deltaR <- 0 # residence time per request
for(j in 0:maxpair) {
rowz <- which(df$JOB_ID == j)
deltaR[j+1] <- df$TIMESTAMP[rowz[2]] - df$TIMESTAMP[rowz[1]]
}
```

One drawback of this approach is you need to determine the maximum request ID that has a corresponding pair (i.e., `maxpair`). Even worse, because of the way `which()` has to scan the data frame, this code takes more than 15 minutes on a quad-core Macbook Air with 8 GB of RAM. However, I was able to gain a 5x speedup by first sorting the data using the request ID to produce adjacent pairs

```
# sort job IDs into pairs
df <- df[order(df$JOB_ID),]
# re-index sorted rows
dfs <- data.frame(df$JOBSTATE,df$TIMESTAMP,df$JOB_ID)
```

so that the sorted data frame has the following format:

```
> head(dfs,10)
df.JOBSTATE df.TIMESTAMP df.JOB_ID
1 Arrived 0.09270234 0
2 Departed 0.33074388 0
3 Arrived 0.10338975 1
4 Departed 0.19268013 1
5 Arrived 0.16829303 2
6 Departed 0.24739371 2
7 Arrived 0.17764726 3
8 Departed 0.24620865 3
9 Arrived 0.18633516 4
10 Departed 0.30368344 4
```

Now, it is simply a matter of alternating over these adjacent pairs:

```
deltaR <- 0 # residence time per request
i <- 1 # sample counter
time1 <- TRUE
for(row in 1:dim(df)[1]) {
if(time1) {
t1 <- df$TIMESTAMP[row]
id1 <- df$JOB_ID[row]
time1 <- FALSE
} else {
id2 <- df$JOB_ID[row]
if(id2 != id1) next # near EOF so skip it
t2 <- df$TIMESTAMP[row]
deltaR[i] <- t2 - t1
i <- i + 1
time1 <- TRUE
}
}
```

Another benefit of this approach is that I didn't need to examine the log file to see which requests started but didn't complete, i.e., didn't form a pair.

Since we have computed the difference of two-thirds of a million pairs, there are one-third of a million instantaneous RT samples shown in Figure 1. We can then determine the sample mean and standard deviation of these individual response times.

```
(Rmu <- mean(deltaR))
[1] 0.2341104
(Rsd <- sd(deltaR))
[1] 0.1999741
```

These values can now be substituted directly into equations $\eqref{eqn:mm8r80}$—$\eqref{eqn:mm8r95}$ to determine the multi-server percentiles.

```
(R80 <- Rmu + 2*Rsd/3)
[1] 0.3674265
(R90 <- Rmu + 4*Rsd/3)
[1] 0.5007425
(R95 <- Rmu + 6*Rsd/3)
[1] 0.6340586
```

We can check these values against the `quantile` function in R:

```
quantile(deltaR,c(0.80,0.90,0.95))
80% 90% 95%
0.3707738 0.5002468 0.6268308
```

The following table summarizes and compares the response-time percentiles evaluated in three different ways to four significant digits of precision.

Simulation | Analytic | ||
---|---|---|---|

Percentile | Empirical | Equations | Equations |

R_{80} | 0.3708 | 0.3674 | 0.3666 |

R_{90} | 0.5002 | 0.5007 | 0.4999 |

R_{95} | 0.6268 | 0.6341 | 0.6332 |

- The
*Empirical*column shows the results of calculating the percentiles manually or using a function like`quantile()`applied to the raw response-time data. - The
*Equations*column, in the*Simulation*section, shows the results of first calculating the mean and standard deviation for the raw data and then applying equations $\eqref{eqn:mm8r80}$—$\eqref{eqn:mm8r95}$. - The
*Equations*column, in the*Analytic*section, shows the results of first calculating the mean and standard deviation analytically for an M/M/8 queuing model and then applying equations $\eqref{eqn:mm8r80}$—$\eqref{eqn:mm8r95}$.

Although there is no closed-form analytic expression for multi-server response-time percentiles (as there is for a single-server), the Guerrilla approximations $\eqref{eqn:mm8r80}$—$\eqref{eqn:mm8r95}$ are just as utilitarian as $\eqref{eqn:mm1r80}$—$\eqref{eqn:mm1r95}$.

### Percentiles in PDQ

Although I haven't implemented these percentiles in PDQ yet, it's now clear to me that rather than trying to include them in the generic PDQ Report (which could really explode its size for a model with many PDQ nodes), it would better to have a separate function that can be called ad hoc. I'll be discussing these results in more detail in the 2014 Guerrilla classes. In the meantime ...
*Merry Xmas!
*

## 3 comments:

The R function

reshape(part of the base packagestats) can be used for a much faster conversion from long format (each observation as one row) into the wide format (associated observations in the same row):reshape(data=df, v.names="TIMESTAMP", timevar="JOBSTATE", idvar="JOB_ID", direction="wide")

Nice suggestion. The reshape-d data has a comparable running time to my loop over ordered pairs and also simplifies the loop logic.

Thanks for the clarification. It's very useful, along with stretch factor, for a quick estimation of what's going on.

Post a Comment