Wednesday, July 30, 2014

A Little Triplet

Little's law appears in various guises in performance analysis. It was known to Agner Erlang (the father of queueing theory) in 1909 to be intuitively correct but was not proven mathematically until 1961 by John Little. In the subsequent discussion, I'll show you that there is actually a triplet of such laws. Quite apart from the common equational form, each of them has a less than obvious interpretation that is handy to know.

To see the Little's law triplet, consider the line of customers at the grocery store checkout lane shown in Figure 1. Following the usual queueing theory convention, the queue includes not only the customers waiting but also the customer currently in service.

Figure 1. Checkout lane decomposed into its space and time components

As an aside, it is useful to keep in mind that there are only three types of performance metric:

  1. Time $T$ (the fundamental performance metric), e.g., minutes
  2. Count or a number $N$ (no formal dimensions), e.g., transactions
  3. Rate $N/T$ (inverse time dimension), e.g., transactions per minute
From this standpoint, Little's law has the general form \begin{equation} N = \dfrac{N}{T} ~\times ~T \label{eqn:dims} \end{equation} which says a rate (type- C) multiplied by time (type- A) produces a number (type- B). The three types of metric should not be confused with the three forms of Little's law being discussed here. Good things often come in threes.

Shoppers in Figure 1 arrive from the left into the checkout lane with arrival rate $\lambda$ customers per minute, for example. $\lambda$ is a type-C metric. The total time they spend at the checkout is the sum of their waiting time, $W$ (due to the customers already ahead of them), and their service time, $S$ (the average time it takes to ring up their groceries). This time is called their residence time \begin{equation} R = W + S \label{eqn:Rtime} \end{equation} $R$ is a type-A metric. It gives us the temporal view of the queue. After paying the cashier each customer departs to the right.

Example

For the purpose of simple numerical calculations, let's assume the following average values for the queueing parameters: \begin{align} \lambda &= 0.5 ~\text{customer per minute} \label{eqn:arate}\\ S &= 1.0 ~\text{minute} \label{eqn:Smins}\\ W &= 1.0 ~\text{minute} \label{eqn:Wmins} \end{align} From eqn.\eqref{eqn:Rtime} it follows that \begin{equation} R = 1 + 1 = 2 ~\text{minutes} \label{eqn:Rmins} \end{equation} The residence time tells us that the average time a customer can expect to spend getting through the checkout lane is two minutes.

Little's Law Version 1

The corresponding average queue length, $Q$, in Figure 1 tells us how much space customers are expected to occupy and it can be determined from this version of Little's law \begin{equation} Q = \lambda R \label{eqn:LLQ} \end{equation} Clearly, \eqref{eqn:LLQ} has the form \eqref{eqn:dims}: a rate multiplied by a time. $Q$ is a number (an average value), so it's a type-B metric. Equation \eqref{eqn:LLQ} is probably the most common form of Little's law arising in performance calculations.

Example

Using the values in \eqref{eqn:arate} and \eqref{eqn:Rmins} it follows that \begin{equation} Q = \dfrac{1}{2} \times 2 = 1 ~\text{customer} \label{eqn:LLQval} \end{equation} The average queue length is 1 customer (i.e., both waiting and in service). How a single customer can be simultaneously waiting and in service will become clearer shortly. In the meantime, that's count one for Little.

Little's Law Version 2

Figure 1 reveals that there's another way to express the queue length, $Q$. Like the residence time, we can also decompose the queue length into the length of the waiting line, $L$, and the average number of customers in service \begin{equation} Q = L + m \rho \label{eqn:Qdecomp} \end{equation} This gives us the spatial view of the queue, where $m$ is the number of servers and $m\rho$ is the average occupancy of the servers. In Figure 1 $m = 1$, since there is only a single cashier, and $0 \leq \rho < 1$ is the fraction of time for which the cashier is busy. The value $\rho = 1$ would correspond to 100% busy. Clearly, the cashier can't be busier than 100% of the time.

Substituting \eqref{eqn:Qdecomp} and \eqref{eqn:Rtime} respectively into the left side and right side of \eqref{eqn:LLQ} produces \begin{equation} L + m \rho = \lambda W + \lambda S \label{eqn:decompAll} \end{equation} The first terms on each side of \eqref{eqn:decompAll} can be equated separately as \begin{equation} L = \lambda W \label{eqn:LLW} \end{equation} Once again \eqref{eqn:LLW} has the form \eqref{eqn:dims}. Equation \eqref{eqn:LLW} tells us that the waiting line length, $L$, is directly proportional to the waiting time, $W$. You can think of it as an estimate of buffer size, if you prefer. Remarkably, this is the form of Little's law that appears in the title of his 1961 paper.

Example

Using the values in \eqref{eqn:arate} and \eqref{eqn:Wmins} it follows that \begin{equation} L = \dfrac{1}{2} \times 1 = \dfrac{1}{2} ~\text{customer} \label{eqn:LLLval} \end{equation} Half a customer may look strange but remember, these are average values. That's count two for Little.

Little's Law Version 3

Repeating the previous step for the corresponding second terms in \eqref{eqn:decompAll}, we have \begin{equation} \rho = \dfrac{\lambda S}{m} \label{eqn:LLS} \end{equation} Again, \eqref{eqn:LLS} has the form \eqref{eqn:dims}: division by the constant $m$ doesn't alter anything. Equation \eqref{eqn:LLS} tells us that the per-server utilization (a number and therefore a type-B metric) is directly proportional to the service time ($S$). In Figure 1, $m=1$ since there is a single server (the cashier).

This version of Little's law is often referred to as the Utilization law, but that doesn't help to make useful connections between performance metrics. It is common practice to use a notation like $U = m \rho$ to represent the total utilization of $m$-servers such that $0 < U < m$.

Example

Using the values in \eqref{eqn:arate} and \eqref{eqn:Smins} it follows that the total utilization is \begin{equation} U = \dfrac{1}{2} \times 1 = 50\% \end{equation} Notice also that \begin{equation} Q = L + U = \dfrac{1}{2} + \dfrac{1}{2} = 1 ~\text{customer} \end{equation} in agreement with the Little's law calculation in \eqref{eqn:LLQval}. And that's count three for Little.

Triplet of Interpretations

The Little's law triplet can be summarized as:

\begin{align*} Q &= \lambda R: \; \text{average number of customers in residence}\\ L &= \lambda W: \; \text{average number of customers in waiting}\\ U &= \lambda S: \; \text{average number of customers in service} \end{align*} each of which is consistent with \eqref{eqn:dims}. Knowing these word interpretations might be the key to solving an otherwise obscure performance analysis problem.

Sunday, July 20, 2014

Continuous Integration Gets Performance Testing Radar

As companies embrace continuous integration (CI) and fast release cycles, a serious problem has emerged in the development pipeline: Conventional performance testing is the new bottleneck. Every load testing environment is actually a highly complex simulation assumed to be a model of the intended production environment. System performance testing is so complex that the cost of modifying test scripts and hardware has become a liability for meeting CI schedules. One reaction to this situation is to treat performance testing as a checkbox item, but that exposes the new application to unknown performance idiosyncrasies in production.

In this webinar, Neil Gunther (Performance Dynamics Company) and Sai Subramanian (Cognizant Technology Solutions) will present a new type of model that is not a simulation, but instead acts like continuous radar that warns developers of potential performance and scalability issues during the CI process. This radar model corresponds to a virtual testing framework that precludes the need for developing performance test scripts or setting up a separate load testing environment. Far from being a mere idea, radar methodology is based on a strong analytic foundation that will be demonstrated by examining a successful case study.

Broadcast Date and Time: Tuesday, July 22, 2014, at 11 am Pacific

Thursday, July 17, 2014

Restaurant Performance Sunk by Selfies

An interesting story appeared last weekend about a popular NYC restaurant realizing that, although the number of customers they served on a daily basis is about the same today as it was ten years ago, the overall service had slowed significantly. Naturally, this situation has led to poor online reviews to the point where the restaurant actually hired a firm to investigate the problem. The analysis of surveillance tapes led to a surprising conclusion. The unexpected culprit behind the slowdown was neither the kitchen staff nor the waiters, but customers taking photos and otherwise playing around with their smartphones.

Using the data supplied in the story, I wanted to see how the restaurant performance would look when expressed as a PDQ model. First, I created a summary data frame in R, based on the observed times:


> df
           obs.2004 obs.2014
wifi.data         0        5
menu.data         8        8
menu.pix          0       13
order.data        6        6
eat.mins         46       43
eat.pix           0       20
paymt.data        5        5
paymt.pix         0       15
total.mins       65      115

The 2004 situation can be represented schematically by the following queueing network

Referring to Figure 1:

Thursday, July 3, 2014

How to Remember the Poisson Distribution

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 values are

> ppois(0:15,4)
 [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

Thursday, June 26, 2014

Load Average in FreeBSD

In my Perl PDQ book there is a chapter, entitled Linux Load Average, where I dissect how the load average metric (or metrics, since there are three reported numbers) is computed in shell commands like uptime, viz.,
[njg]~/Desktop% uptime
16:18  up 9 days, 15 mins, 4 users, load averages: 2.11 1.99 1.99

For the book, I used Linux 2.6 source because it was accessible on the web with convenient hyperlinks to navigate the code. Somewhere in the kernel scheduler, the following C code appeared:


#define FSHIFT    11          /* nr of bits of precision */ 
#define FIXED_1   (1<<FSHIFT) /* 1.0 as fixed-point */ 
#define LOAD_FREQ (5*HZ)      /* 5 sec intervals */ 
#define EXP_1     1884        /* 1/exp(5sec/1min) fixed-pt */ 
#define EXP_5     2014        /* 1/exp(5sec/5min) */ 
#define EXP_15    2037        /* 1/exp(5sec/15min) */ 

#define CALC_LOAD(load,exp,n) \
 load *= exp; \
 load += n*(FIXED_1-exp); \
 load >>= FSHIFT;

where the C macro CALC_LOAD computes the following equation

Friday, June 6, 2014

The Visual Connection Between Capacity And Performance

Whether or not computer system performance and capacity are related is a question that comes up from time to time, especially from those with little experience in either discipline. Most recently, it appeared on a Linked-in discussion group:
"...the topic was raised about the notion that we are Capacity Management not Performance Management. It made me think about whether performance is indeed a facet of Capacity, or if it belongs completely separate."

As a matter of course, I address this question in my Guerrilla training classes. There, I like to appeal to a simple example—a multiserver queue—to exhibit how the performance characteristics are intimately related to system capacity. Not only are they related but, as the multiserver queue illustrates, the relationship is nonlinear. In terms of daily operations, however, you may choose to focus on one aspect more than the other, but they are still related nonetheless.

Thursday, June 5, 2014

Importing an Excel Workbook into R

The usual route for importing data from spreadsheet applications like Excel or OpenOffice into R involves first exporting the data in CSV format. A newer and more efficient CRAN package, called XLConnect (c. 2011), facilitates reading an entire Excel workbook and manipulating worksheets and cells programmatically from within R.

XLConnect doesn't require a running installation of Microsoft Excel or any other special drivers to be able to read and write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE). Moreover, XLConnect can handle older .xls (BIFF) as well as the newer .xlsx (Office XML) file formats. Internally, XLConnect uses Apache POI (Poor Obfuscation Implementation) to manipulate Microsoft Office documents.

As a simple demonstration, the following worksheet, from a Guerrilla Capacity Planning workbook, will be displayed in R.

First, the Excel workbook is loaded as an R object: