Sunday, July 10, 2011

The Multiserver Numbers Game

In a previous post, I explained why \begin{equation} R_m \neq \dfrac{R_1}{m} \label{eqn:badest} \end{equation} and therefore doesn't work as an estimator of the mean residence time in an M/M/m multi-server queue. Although we expect the extra server capacity with m-servers to produce a shorter residence time ($R_m$), it is not m-times smaller than the residence time ($R_1$) for a single-server (i.e., $m = 1$) queue.

M/M/m multiserver queue

The problem is that eqn. \eqref{eqn:badest} grossly underestimates $R_m$, which is precisely the wrong direction for capacity planning scenarios. For that purpose, it's generally better to overestimate performance metrics. That's too bad because it would be a handy Guerrilla-style formula if it did work. You would be able do the calculation in your head and impress everyone on your team (not to mention performing it as a party trick).

Given that eqn. \eqref{eqn:badest} is a poor estimator, you might wonder if there's a better one, and if you'd been working for Thomas Edison he would have told you: "There's a better wsy. Find it!" Easy for him to say. But if you did decide to take up Edison's challenge, how would you even begin to search for such a thing?

One way to proceed in situations like this is to play around with numbers and look for any interesting patterns that might emerge. In that vein, we're going to play a kind of multiserver numbers game and see if it reveals anything to us about queueing times. You may be thinking that playing games is a foolish thing to do when it comes to a serious subject like queueing theory, but believe me, even the mathematical heavies like Euler and Gauss were not above playing numbers games just like the one I'm about to show you. And, as you may have heard, they made a lot of important discoveries that way.

A Numbers Game

We know that R1/m manifestly underestimates the Rm residence time. Another way to say that is: the estimator is too small because the divisor m is too big. Could we design a divisor that is smaller than m? That looks like an idiotic question because m is just a number—the number of servers needed to service the waiting line so as to meet the service level objective, Rm. You can choose any m but it has to be a whole number (an integer). So, what could it possibly mean to "redesign" that number? Well, this is where thinking like a child can have real benefits.

For example, with childlike innocence I could write $m = 2$ as $m = 1 + 1$. Numerically, it's the same thing. But here comes the really cute trick. I know that $m = 1 + 1$ is too big in eqn. \eqref{eqn:badest} so, suppose I forget about what m really means in the real world of CaP, and replace it with something smaller, e.g., $m = 1 + 1/2$. That expression may look weird but it's certainly smaller than $m = 2$. Besides, it's just a game where all "adult" criticism is hereby disabled. To test out this kind of number play, it's important to be systematic. So, let's tabulate our results as we go.

The M/M/2 Game

For simplicity I'll assume R1 = 2 minutes per customer at a grocery checkout with 1 cashier. The exact Erlang formula for m = 2 cashiers predicts a reduced residence time of R2 = 1.33 mins. How does our new estimator stack up against that exact calculation? Here's a tabulation of the results.

Exact $R_2 = 1.33$
Row Series Ratio Estimated $R_2$
1 $1 + 1$ $\dfrac{2}{1 + 1}$ $1.00$
2 $1 + 1/2$ $\dfrac{2}{1 + 1/2}$ $1.33$

The 1st column is the row number of the numerical quantities. The 2nd column (Series) shows the numerical series we are using for m. The 3rd column (Ratio) shows how the series is used in the estimator—both old and new—and the 4th column (Estimated) shows the numerical value of each type of estimator.

The 1st row of values corresponds to eqn. \eqref{eqn:badest} used in the previous post. In that case, we would have arrived at R1/2 = 1 minute, which underestimates the exact value by 0.33 minutes. However, in the 2nd row of values, with that crazy 1/2, we are spot on. Wow!

The M/M/3 Game

Encouraged by this foolishness, let's consider the m = 3 server case. The next question we have to address is, what to chose for the third number in the series for m? If 1/2 worked as the second term for the m-series in the m = 2 case, that suggests 1/3 might be a good choice for the third term of this m-series. See 1st column, 3rd row. In that case, the rule that seems to be emerging is: count the term in the m-series and invert that count. Since the new term is a 3rd term in the series, add 1/3. Choosing 1/3 for the third term certainly moves things in the right direction (see 3rd column, 3rd row) when compared with just using m = 3, but there's still a significant gap. Can we make the gap smaller?

Exact R3 = 1.16
Row Series Ratio Estimated R3
1 $1 + 1 + 1$ $\dfrac{2}{1 + 1 + 1}$ $0.67$
2 $1 + 1/2 + 1/3$ $\dfrac{2}{1 + 1/2 + 1/3}$ $1.09$
3 $1 + 1/2 + 1/4$ $\dfrac{2}{1 + 1/2 + 1/4}$ $1.14$

If, instead of using the inverse of the term, we halve each successive term then we get $(1/2)/2 = 1/4$ for the third term. Certainly, 1/4 is smaller than 1/3. Using 1/4 in the 3rd row of the above table shows a vast improvement, although it's not spot on, like the m = 2 case. But that's fine. Let's not lose our heads in all the excitement. We are only seeking a better estimator, not a wholesale replacement for the very complex Erlang formula. If things were that simple it would have been done 100 years ago (or earlier) and we would never have heard of Agner Erlang.

The M/M/4 Game

Let's check one more time with m = 4 servers.

Exact R4 = 1.09
Row Series Ratio Estimated R4
1 $1 + 1 + 1 + 1$ $\dfrac{2}{1 + 1 + 1 + 1}$ $0.50$
2 $1 + 1/2 + 1/3 + 1/4$ $\dfrac{2}{1 + 1/2 + 1/3 + 1/4}$ $0.96$
3 $1 + 1/2 + 1/4 + 1/8$ $\dfrac{2}{1 + 1/2 + 1/4 + 1/8}$ $1.07$

Once again, we see in the 3rd row that the halving rule seems to work best. In fact, it will work for any arbitrary value of m.

What about other $R_1$ values besides 2 minutes—the numerator in the Ratio column? Unfortunately, the answer there is negatory. The halving rule only works for $R_1 = 2$ minutes. Any other value and the ratio will give incorrect results. Oh oh! What's gone wrong?

The General Game

Nothing has gone wrong, I just chose a special case for the M/M/1 residence time in order to reveal simple number patterns. If we played around with other values of R1, we'd eventually find the general rule. It does help to know what you're doing so, I'll just tell you the result. Or, feel free to keep playing the game and come back later to compare your results with the rest of this post.

In the above tables I used $R_1 = 2$ minutes, and that choice contains two assumptions which I deliberately didn't tell you about. I assumed something about the service time, viz., $S = 1$ minute and an arrival rate of $\lambda = 0.5$ per minute. The server utilization is then given by $\rho = \lambda \, S = 0.5$ from Little's law. The halving rule has emerged because I have been assuming a server utilization of $\rho = 1/2$. At 50% busy, the single-server residence time is \begin{equation*} R_1 = \dfrac{1}{1-\frac{1}{2}} = 2 ~\text{minutes} \end{equation*} in the above tables. That's where the 2 came from. It made the fractional series progressions more intuitive, but it's also a special case and therefore not generalizable.

On the other hand, if I had chosen some arbitrary utilization value, the corresponding fractions in the series would have been more accurate but they would also have obscured the pattern in the progressions. Not to worry because I can already see how to generalize those nice simple number patterns to form a general rule. Before I explain that, however, a few additional remarks are in order.

  • I didn't choose $R_1 = 2$ minutes or $\rho = 1/2$ to trick you. It's actually the sanest thing to do when you're fooling around and not sure of the outcome. Always choose numbers where the patterns will be most obvious. A simple pattern is also easier to validate. When acting the fool, there's plenty of room to fool yourself.
  • Because the following is often a point of confusion, I need to emphasize that I'm comparing all M/M/m servers operating at the same per-server $\rho$ value. That means the arriving traffic has already been adjusted in each configuration to make $\rho$ the same across each m configuration. Even though this may not be how things are done in practice, it is the appropriate assumption for the comparisons being made here. Looking at the diagram of the M/M/m queue (top of this post) you can see that the traffic gets split m-ways. If nothing else is done, then the per-server utilization will fall with each additional server. Therefore, in order to keep things fair and balanced for comparison purposes, the total traffic arriving into the system has to be increased in proportion to the number of m servers. Since we're not using $\lambda$ explicitly in the tables, this step is hidden in the value of $\rho$ by virtue of Little's law.
  • For M/M/1, I have the rule-of-thumb that if the server utilization is 50%, then the residence time will be 2 service periods or $R_1 = 2$, no matter the actual service time or the actual arrival rate.

Returning to the business of generalizing the halving rule, the series in the 2nd column has to be expressed in terms of the per-server utilization $\rho$ with whatever value that happens to be, but not a fixed value like $1/2$. It looks like this \begin{equation} 1 + \rho + \rho^2 + \rho^3 + \cdots + \rho^m \label{eqn:goodseries} \end{equation} and the corresponding Ratio in the above tables becomes \begin{equation} \dfrac{R_1}{1 + \rho + \rho^2 + \rho^3 + \cdots + \rho^m} \label{eqn:goodest} \end{equation} So, instead of multiplying by $1/2$ to get each successive term in the series, we multiply by the utilization $\rho$ to produce: $1 + \rho + (\rho \times \rho) + (\rho \times \rho) \times \rho + \cdots$ and so on, up to $m+1$ terms in eqn. \eqref{eqn:goodseries}. The series starts with 1 because the first term is $\rho^0 = 1$. I deliberately chose $\rho = 1/2$ in the above tables to make the numeric patterns more obvious. If you substitute that special value into eqn. \eqref{eqn:goodest}, you should recover the tabulated results.

This generalized estimator in eqn. \eqref{eqn:goodest} will always be superior to the naive estimator in eqn. \eqref{eqn:badest}. Try it, you'll like it.

Finally, I should note that although I started out looking for a way to express a reduced m-divisor, e.g., I wrote $m = 1 + 1/2 + 1/4$, the ultimate series in ρ does not represent m, per se. It's formally some other function—I call it the "morphing function" but that's whole 'nother story. However, if all m servers are running full tilt at 100% busy (and ignoring unbounded queueing effects), the $\rho$-series above becomes $1 + 1 + \cdots + 1$, up to m terms because the number 1 raised to any power is still 1. In that way we do get back to my starting point, which was to "foolishly" write: $m = 1 + 1 + \cdots + 1$.

What's interesting for me is that although I already knew this result (see Section 2.7 of my Perl::PDQ book), I derived it there using the calculus of a finite geometric series rather than purely numerically, as I've done here. The numerical approach described here arose out of some very interesting student comments and questions during the May 2011 Guerrilla classes. Your next opportunity to help me learn new things will be the upcoming GDAT class in August 2011.

Postscript 2016

In my PPDQ book, I alluded to the derivation based on a finite geometric series as the morphing approximation for an M/M/m queue. The quest to find the connection between my morphing approximation and Erlang's exact solution has now been resolved. See the sketch proof in Erlang Redux Resolved!


Baron said...

Nice rule of thumb. It would be good to have some graphs to show the various rules of thumb compared to the Erlang formula. It would probably need to be a pseudo-3-dimensional graph to show variations in both rho and m.

Neil Gunther said...

Done. :)


3d-plot of errors can be found on p. 64 of The PPA book

and a density plot can be seen on p. 83 of my Perl::PDQ book

Maybe I should resurrect them and hoist them into another blog post. Good suggestion. Thanks.