Saturday, January 12, 2008

Erlang Explained

During the GCaP class, last November, I mentioned to my students that I was working on a more intuitive derivation of the Erlang B and C formulae ( the English wiki link to Erlang-C is wrong) associated with the multi-server or M/M/m queue. This queue is very important because it provides a simple performance model of a multiprocessor or multicore systems, amongst other things. When I say simple here, I mean the queueing model is "simple" relative to the complexities of a real multiprocessor (e.g., it does not include the effects of computational overhead). On the other hand, the mathematics of this particular queueing model is not simple. Not only do we need to resort to using some kind of computational aid to solve it (some people have built careers around finding efficient algorithms), the formulae themselves are completely unintuitive. During that class I thought I was pretty close to changing all that; to the point where I thought I might be in a position to present it before the class finished. I was wrong. It took a little bit longer than a week. :-)

In the GCaP course notes, the Erlang equations occur in the section on a single common waiting line with m servers. Erlang, who was employed by the Danish Telephone Company, first formulated and solved this queueing model in 1917 for doing capacity planning of trunk calls; each trunk line being a separate server. The trouble is, even his original paper offers no insight into what is going on physically, nor does it explain why the M/M/m queue is so much more complicated mathematically than the M/M/1 queue. This summary by my colleague Mike Tanner gives you an idea what "complicated" means in this context. Consequently, you really need to program a calculator or computer application (e.g., Excel) a to solve for the M/M/m performance metrics e.g., the expected queue length (Q) or mean waiting time (W).

The complexity arises from the mathematical expression for the probability that an incoming request (e.g., a phone call) finds all m trunk switches occupied (busy) and therefore must wait ("Call waiting" in modern parlance). That probability is given by the Erlang-C function, which is expressed in terms of the number of servers (m) and how busy each server is (Greek letter: rho). Just to give you an idea, here's what the Erlang B and C functions look like as we progress from m = 1 to m = 6 servers:

What do these complicated rational functions mean physically? The fact that they can be derived from a bunch of inscrutable probability equations associated with Markov Chains is mere mathematics, not a physical explanation. Queues are real and therefore all those rho's and m's must be physically identifiable with what's going on in the queue at the bank or post office or help desk or multiprocessor or multicores. As far as I know, that level of explanation has never been given until now.

In the spirit of "going Guerrilla", I have always avoided such complications in my classes by using a simpler but approximate formula for the residence (or response) time, viz., $R = S/(1 - \rho^m)$ where S is the mean service time and rho is the per-server utilization. This simpler formula can easily go into a single Excel cell, for example. Remember, $R = S + W$. In Chapter 2 of my Perl::PDQ book, I derived this approximate formula using a finite geometric series. The question still remained, could I use my approximation to get to the exact formula? The idea I had was to adjust the approximation by adding in the appropriate corrections. That turns out to be rather fruitless because the corrections are as inscrutable as the exact expressions one is trying to derive. Moreover, any physical insight (the key objective) remains obscured behind a morass of polynomials, like those in the above table. However, now that I've solved it via a completely different route, I can look back with the benefit of hindsight and see the connection between my previous approximation and my new exact derivation.

Using a few simple diagrams, not only can I derive the Erlang-C formula (the most complicated one), but remarkably it forces out the waiting-time equation as part of the process. They are intimately linked together. This approach also provides a completely different and IMHO more comprehensible view of how the M/M/m queue relates to M/M/1. At no point do I invoke Markov Chains or any probably theory beyond the Poisson distribution, which you must grasp anyway to properly understand M/M/1 or any queueing concepts.

The next task is to write this out in a more comprehensible way than my current writing-pad scribbles and Mathematica code and see if I still believe it. ;-)

Postscript (July 2016)

The preceding did not pan out quite as unexpected. It was another in a sequence of several misfires. The purely diagrammatic approach did not work out correctly. However, the ultimate goal of this quest has now been achieved, albeit not quite in the "simple" form I had originally envisaged. Please refer to the post entitled, Erlang Redux Resolved! (This time for real) of July 28, 2016, which also shows an animation of the morphing approximation model of the M/M/m queue.


metasoft said...

happy new year and congratulations on a improved derivation of Erlang C. pls post it when you get a chance. hope to see you in aug GDAT class.

Anonymous said...

Did you ever write this out in full?

Neil Gunther said...

I haven't, but thanks for asking. It did make number 4 of my 2011 list but still needs to be written up. It hasn't had a high priority because I wasn't aware that anyone was really that interested, besides me. Your question might have changed that. :)

Unknown said...

Hi I am really interested in how you managed to derive the Erlang C formula in an easier way than usual. Did you by any chance manage to accomplish it?

Neil Gunther said...

Hi Luca and thanks for asking. Please see the postscript that I just added to the above post and then check out the animation in my July 2016 blog post Feel free to ask any additional questions.