## Friday, June 6, 2014

### The Visual Connection Between Capacity And Performance

Whether or not computer system performance and capacity are related is a question that comes up from time to time, especially from those with little experience in either discipline. Most recently, it appeared on a Linked-in discussion group:
"...the topic was raised about the notion that we are Capacity Management not Performance Management. It made me think about whether performance is indeed a facet of Capacity, or if it belongs completely separate."

As a matter of course, I address this question in my Guerrilla training classes. There, I like to appeal to a simple example—a multiserver queue—to exhibit how the performance characteristics are intimately related to system capacity. Not only are they related but, as the multiserver queue illustrates, the relationship is nonlinear. In terms of daily operations, however, you may choose to focus on one aspect more than the other, but they are still related nonetheless.

The multiserver queue describes the everyday situations like purchasing tickets at a movie theater, visiting your bank, waiting on a customer support center and checking your baggage at an airline counter. In the computer context, the multiserver queue could describe a multicore processor or a load-balanced cluster. The feature common to all these queueing systems is that they have a single waiting line (or queue) and multiple agents (or servers) to service that queue. The operational characteristics are determined by:

1. capacity: the number of agents ($m$)
2. load: the utilization of the agents ($\rho$)
3. performance: the time you spend in the system ($R$)
The nonlinear relationship between these metrics is best viewed as a 3D plot.

The nearest axis ($0 \leq \rho \leq 1$) is the load on the servers due to the arrival rate ($\lambda$) of customers. The connection follows from Little's law, $\rho = \lambda S$, where $S$ is the mean service time per customer. The vertical axis is the normalized residence time ($R/S$). The axis to the upper-right is the number of agents ($1 \leq m \leq 6$ in this case) servicing the waiting line. The nonlinear relationship between these three metrics is represented by the 3D surface.

The front side of the surface shows that when there is only a single agent available the waiting line grows in a nonlinear way. In particular:

1. at 50% busy ($\rho = 0.50$) the residence time is two service periods ($R=2\,S$)
2. at 75% busy ($\rho = 0.75$) the residence time is four service periods ($R=4\,S$)
3. at 90% busy ($\rho = 0.90$) the residence time is ten service periods ($R=10\,S$)

Although the growth is clearly nonlinear, it is not exponential, as is commonly and incorrectly asserted.

Less obvious is the fact that, when there are more agents, the residence time remains close to the service time (i.e., $\mbox{R/S = 1}$), even under heavy loads. With more servers, a greater customer load is required for the waiting line to begin to grow. That's why the surface appears "flatter" at the back of the plot than at the front. Precisely how the surface flattens out is a matter of considerable mathematical subtlety. Elsewhere, I explained more about that in terms of a numbers game.

A similar argument can be applied to a grocery store where the capacity is expressed by the number of parallel checkout lanes. There, the customer load is split on entry at the tail of each parallel queue rather than at the head of the queue, as it is in the multiserver case. The queueing characteristics are subtly different.