The Pith of Performance: Erlang

Showing posts with label Erlang. Show all posts

Wednesday, March 15, 2017

Morphing M/M/m: A New View of an Old Queue

The following abstract has been accepted for presentation at the 21st Conference of the International Federation of Operational Research Societies — IFORS 2017, Quebec City, Canada.

Update July 31, 2017: Here are my IFORS slides
Update June 08, 2018: In response to an audience question at my IFORS 2017 session, I have now demonstrated that there is an upper bound for the error in the morphing approximation. See footnotes below.
Update August 18, 2020: This new paper has the complete exact solution method that I had been seeking all along.

This year is the centenary of A. K. Erlang's paper [1] on the determination of waiting times in an M/D/m queue with $m$ telephone lines.^* Today, M/M/m queues are used to model such systems as, call centers [3], multicore computers [4,5] and the Internet [6,7]. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation^† to the exact M/M/m queue [3] that is accurate to within 10% for typical applications^‡. The morphing formula for the residence-time, $R(m,\rho)$, is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:

The morphing solutions correspond to the $m$-roots of unity in the complex $z$-plane.
The exact solutions can be expressed as a rational function, $R(m,z)$.
The poles of $R(m,z)$ lie inside the unit disk, $|z| < 1$, and converge around the Szego; curve [8] as $m$ is increased.
The correction factor for the morphing model is defined by the deflated polynomial belonging to $R(m,z)$.
The pattern of poles in the $z$-plane provides a convenient visualization of how the morphing solutions differ from the exact solutions.

* Originally, Erlang assumed the call holding time, or mean service time $S$, was deterministic with unit period, $S=1$ [1,2]. The generalization to exponentially distributed service periods came later. Ironically, the exponential case is easier to solve than the apparently simpler deterministic case. That's why the M/D/1 queue is never the first example discussed in queueing theory textbooks.
† The derivation of the morphing model is presented in Section 2.6.6 of the 2005 edition of [4], although the word "morphing" is not used there. The reason is, I didn't know how to produce the exact result from it, and emphasizing it would likely have drawn unwarranted attention from Springer-Verlag editors. By the time I was writing the 2011 edition of [4], I was certain the approximate formula did reflect the morphing concept in its own right, even though I still didn't know how to connect it to the exact result. Hence, the verb "morphs" timidly makes its first and only appearance in the boxed text following equation 4.61.
‡ The relative error peaks at 10% for $m \sim 20$ and $\rho \sim 90\%$, then peaks again at 20% for $m \sim 2000$ and the servers running 99.9% busy. However, the rate of increase in peak error attenuates such that the maximum error is less than 25%, even as $m \rightarrow \infty$ and $\rho \rightarrow 100\%$. A plot of the corresponding curves gives a clearer picture. This behavior is not at all obvious. Prior to this result, it could have been that the relative error climbed to 100% with increasing $m$ and $\rho$.

References

A. K. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917.
A. K. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, vol 20, 1909.
E. Chromy, T. Misuth, M. Kavacky, "Erlang C Formula and Its Use In The Call Centers," Advances in Electrical and Electronic Engineering, Vol. 9, No. 1, March 2011.
N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer-Verlag, 2005 and 2011.
N. J. Gunther, S. Subramanyam, and S. Parvu, "A Methodology for Optimizing Multithreaded System Performance on Multicore Platforms," in Programming Multicore and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing, February 2017.
N. J. Gunther, "Numerical Investigations of Physical Power-law Models of Internet Traffic Using the Renormalization Group," IFORS 2005, Honolulu, Hawaii, July 11—15.
T. Bonald, J. W. Roberts, "Internet and the Erlang formula," ACM SIGCOMM Computer Communication Review, Volume 42, Number 1, January 2012.
C. Diaz Mendoza and R. Orive, "The Szegő curve and Laguerre polynomials with large negative parameters," Journal of Mathematical Analysis and Applications, Volume 379, Issue 1, Pages 305—315, 1 July 2011.

Saturday, October 8, 2016

Crib Sheet for Emulating Web Traffic

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools (PDF) is now available online and will be presented at the upcoming CMG conference in November.

Presenter: James Brady (co-author: Neil Gunther)
Session Number: 436
Subject Area: APM
Session Date: Wed, November 9, 2016
Session Time: 1:00 PM - 2:00 PM
Session Room: PortofinoB

Erlang Redux Resolved! (This time for real)

As I show in my Perl::PDQ book, the residence time at an M/M/1 queue is trivial to derive and (unlike most queueing theory texts) does not require any probability theory arguments. Great for Guerrillas! However, by simply adding another server (i.e., M/M/2), that same Guerrilla approach falls apart. This situation has always bothered me profoundly and on several occasions I thought I saw how to get to the exact formula—the Erlang C formula—Guerrilla style. But, on later review, I always found something wrong.

Although I've certainly had correct pieces of the puzzle, at various times, I could never get everything to fit in a completely consistent way. No matter how creative I got, I always found a fly in the ointment. The best I had been able to come up with is what I call the "morphing model" approximation where you start out with $m$ parallel queues at low loads and it morphs into a single $m$-times faster M/M/1 queue at high loads.

That model is also exact for $m = 2$ servers—which is some kind of progress, but not much. Consequently, despite a few misplaced enthusiastic announcements in the past, I've never been able to publish the fully corrected morphing model.

Performance of N+1 Redundancy

How can you determine the performance impact on SLAs after an N+1 redundant hosting configuration fails over? This question came up in the Guerrilla Capacity Planning class, this week. It can be addressed by referring to a multi-server queueing model.

N+1 = 4 Redundancy

We begin by considering a small-N configuration of four hosts where the load is distributed equally to each of the hosts. For simplicity, the load distribution is assumed to be performed by some kind of load balancer with a buffer. The idea of N+1 redundancy is that the load balancer ensures all four hosts are equally utilized prior to any failover.

The idea is that none of the hosts should use more than 75% of their available capacity: the blue areas on the left side of Fig. 1. The total consumed capacity is assumed to be $4 \times 3/4 = 3$ or 300% of the total host configuration (rather than all 4 hosts or 400% capacity). Then, when any single host fails, its lost capacity is compensated by redistributing that same load across the remaining three available hosts (each running 100% busy after failover). As we shall show in the next section, this is a misconception.

Figure 1. N+1 redundant hosts before (left) and after failover (right)

The circles in Fig. 1 represent hosts and rectangles represent incoming requests buffered at the load-balancer. The blue area in the circles signifies the available capacity of a host, whereas white signifies unavailable capacity. When one of the hosts fails, its load must be redistributed across the remaining three hosts. What Fig. 1 doesn't show is the performance impact of this capacity redistribution.

Response Time Percentiles for Multi-server Applications

In a previous post, I applied my rules-of-thumb for response time (RT) percentiles (or more accurately, residence time in queueing theory parlance), viz., 80th percentile: $R_{80}$, 90th percentile: $R_{90}$ and 95th percentile: $R_{95}$ to a cellphone application and found that the performance measurements were not completely consistent. Since the relevant data only appeared in a journal blog, I didn't have enough information to resolve the discrepancy; which is ok. The first job of the performance analyst is to flag performance anomalies but most probably let others resolve them—after all, I didn't build the system or collect the measurements.

More importantly, that analysis was for a single server application (viz., time-to-first-fix latency). At the end of my post, I hinted at adding percentiles to PDQ for multi-server applications. Here, I present the corresponding rules-of-thumb for the more ubiquitous multi-server or multi-core case.

Single-server Percentiles

First, let's summarize the Guerrilla rules-of-thumb for single-server percentiles (M/M/1 in queueing parlance): \begin{align} R_{1,80} &\simeq \dfrac{5}{3} \, R_{1} \label{eqn:mm1r80}\\ R_{1,90} &\simeq \dfrac{7}{3} \, R_{1}\\ R_{1,95} &\simeq \dfrac{9}{3} \, R_{1} \label{eqn:mm1r95} \end{align} where $R_{1}$ is the statistical mean of the measured or calculated RT and $\simeq$ denotes approximately equal. A useful mnemonic device is to notice the numerical pattern for the fractions. All denominators are 3 and the numerators are successive odd numbers starting with 5.

Book Review: Botched Erlang B and C Functions

As a consequence of looking into a question on the GCaP google group about the telecom performance metric known as the busy hour, I came across this book.

A new copy comes with a $267.44 price tag!!! Here is my review; slightly enhanced from the version I wrote on Amazon.

On the Accuracy of Exponentials and Expositions

The following is a slightly edited version of my response to a Discussion on the Linkedin CPPE group, which is accessible to Members Only. It's written in the style of a journal reviewer. The original Discussion topic was based on a link to a blog-post. I've been asked to make my Linkedin review more widely available so, here tiz...

The blog-post Capacity Planning on a Cocktail Napkin is a really good example of a really bad explanation. There are so many things that are misleading, at best, and flat-out wrong, at worst, it's hard to know where to begin (or where to stop). Nevertheless, I'll try to keep it brief [I failed in that endeavor. — njg].
The author applies the equation:
\begin{equation} E = λ \end{equation}
Why? What is that equation? We don't know because the author does not say yet what all the symbols mean. It's all very well to impress people by slinging equations around, but it's more important to say what the equations actually mean. After all, the author might have chosen the wrong one.

My Year in Review 2011

Some days I wonder if I ever actually accomplish anything anymore. Maybe it's time to just pack it in and become a greeter at Walmart. I know a bit about how queues work, so that should put me a few notches ahead of the competition. And I would expect the competition to be fierce because it's a pretty cushy job; but not every day, apparently.

Before taking the big leap, I decided it might be a good idea to note down some of the technical projects I've worked on this year (over and above the daily grind):

The Multiserver Numbers Game

In a previous post, I explained why \begin{equation} R_m \neq \dfrac{R_1}{m} \label{eqn:badest} \end{equation} and therefore doesn't work as an estimator of the mean residence time in an M/M/m multi-server queue. Although we expect the extra server capacity with m-servers to produce a shorter residence time ($R_m$), it is not m-times smaller than the residence time ($R_1$) for a single-server (i.e., $m = 1$) queue.

M/M/m multiserver queue

The problem is that eqn. \eqref{eqn:badest} grossly underestimates $R_m$, which is precisely the wrong direction for capacity planning scenarios. For that purpose, it's generally better to overestimate performance metrics. That's too bad because it would be a handy Guerrilla-style formula if it did work. You would be able do the calculation in your head and impress everyone on your team (not to mention performing it as a party trick).

Given that eqn. \eqref{eqn:badest} is a poor estimator, you might wonder if there's a better one, and if you'd been working for Thomas Edison he would have told you: "There's a better wsy. Find it!" Easy for him to say. But if you did decide to take up Edison's challenge, how would you even begin to search for such a thing?

Hotsos 2011: Mine the GAPP

It's that time of year again so, here I am in Dallas to present "Brooks, Cooks, and Response Time Scalability," where I will be showing how my universal scalability law (USL) can be applied to quantifying response-time scaling; as opposed to the more typical throughput scaling.

Jackson's Theorem for the Cloud

Queueing theory, as a distinct discipline, just turned 100 last year. Compared with mathematics and physics, it's a relative youngster. Some seminal results include: Erlang's original solution for the M/D/1 queue (1909), his solutions for a multiserver queue without a waiting line M/M/m/m and with a waiting line M/M/m/∞; AKA "call waiting" (1917), the Pollaczek–Khinchine formula for the M/G/1 queue (1930) and Little's proof (1961). These results were established in the context of individual queueing facilities.

Response Time Knees and Queues

How do you determine where the response-time "knee" occurs? This is a question one commonly hears with reference to characterizing the performance of an application. Calculating where the response time suddenly begins to climb dramatically is considered, by many, to be an important determinant for such things as load testing, scalability analysis, and setting application service targets.

In a previous blog post, I pointed out that such a "knee" is actually an optical illusion. Nonetheless, this same question arose in last month's CMG MeasureIT, as a kind of survey entitled "Does the Knee in a Queuing Curve Exist or is it just a Myth?" Although that author concludes (correctly) that the existence of a "knee" (as it is usually meant) is bogus, the panoply of responses was quite astounding—especially coming from professionals who ought to know better. In this month's MeasureIT, I examine the same question in a rigorous but unconventional way under the title "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ."

Wednesday, July 29, 2009

Remembering Mr. Erlang as a Unit

Not to be confused with Frank Zappa's daughter or a Coneheads spousal unit, is the Erlang unit. The number of Erlangs (E) is defined as:

E = λ S,

(1)

and is a well-known in the teletraffic industry, where it was first used as a measure of call capacity. For this reason, it's also called the traffic intensity.

Amazon and Google Discover Erlang

Erlang the language, that is.

Watch Your Knees and Queues

Beware of optical illusions!

The above plot, showing the normalized response times (R/S) for an M/M/m queue (i.e., a single waiting line with m servers), popped up several times at Hotsos 2008. The M/M/m queue can be employed to model the performance of multiple Oracle processes. Here, the curves correspond to m = 1 (black), 2, 3, 9, 16 (blue) plotted against average server utilization.

Erlang Explained

During the GCaP class, last November, I mentioned to my students that I was working on a more intuitive derivation of the Erlang B and C formulae ( the English wiki link to Erlang-C is wrong) associated with the multi-server or M/M/m queue. This queue is very important because it provides a simple performance model of a multiprocessor or multicore systems, amongst other things. When I say simple here, I mean the queueing model is "simple" relative to the complexities of a real multiprocessor (e.g., it does not include the effects of computational overhead). On the other hand, the mathematics of this particular queueing model is not simple. Not only do we need to resort to using some kind of computational aid to solve it (some people have built careers around finding efficient algorithms), the formulae themselves are completely unintuitive. During that class I thought I was pretty close to changing all that; to the point where I thought I might be in a position to present it before the class finished. I was wrong. It took a little bit longer than a week. :-)

Hotsos Oracle Symposium 2008

I've been invited to speak again at the Hotsos Oracle Symposium, March 2-6 in Dallas, Texas. I'm more than happy to do this because I found last year's symposium to be a first class operation with plenty of great speakers and an attentive audience who were very interested in performance analysis and capacity planning in general, in addition to it's applicability for ORACLE.

Just as an aside, if you look at the Hotsos company logo at the top of their web pages, you'll see some equations or bits of equations. The first of these is the denominator of the famous Erlang-C function (A. Erlang, 1917). More on that in an upcoming blog entry.

Sunday, October 28, 2007

Erlang's Collected Papers

In 1948, the collected papers of Agner Erlang (AKA the father of queueing theory) were translated from the original Danish and published in the Transactions of the Danish Academy of Technical Sciences. They were reissued as a book by Acta Polytechnica Scandinavica in 1960, but due its underwhelming popularity, that book is now out of print. However, I just discovered that the chapters of the book are now available on the web. Kudos to the Academy!

Greek for Geeks

When we come to discuss queueing models in my training classes, I emphasize the fact that my approach to the subject comes from wanting to avoid the blizzard of Greek (mathematical notation) that usually makes the whole subject so obscure to the very people who could use it most; performance analysts.

Wednesday, March 15, 2017

References

Saturday, October 8, 2016

Thursday, July 28, 2016

Thursday, March 13, 2014

N+1 = 4 Redundancy

Wednesday, December 25, 2013

Single-server Percentiles

Friday, April 26, 2013

Friday, February 24, 2012

Sunday, January 1, 2012

Sunday, July 10, 2011

Tuesday, March 8, 2011

Saturday, May 22, 2010

Tuesday, August 18, 2009

Wednesday, July 29, 2009

Wednesday, July 9, 2008

Saturday, March 8, 2008

Saturday, January 12, 2008

Sunday, October 28, 2007

Wednesday, May 23, 2007