The Pith of Performance: queueing theory

Showing posts with label queueing theory. Show all posts

Wednesday, June 20, 2018

Chargeback in the Cloud - The Movie

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

Saturday, April 21, 2018

Virtual cloudXchange 2018 Conference

Our abstract has been accepted for presentation at the FREE cloudXchange online event to be held by CMG on June 19th at 10am Pacific (5pm UTC). [Extended slides]

Exposing the Cost of Performance
Hidden in the Cloud

Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany
10am Pacific Time on June 19, 2018

Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.

Wednesday, March 15, 2017

Morphing M/M/m: A New View of an Old Queue

The following abstract has been accepted for presentation at the 21st Conference of the International Federation of Operational Research Societies — IFORS 2017, Quebec City, Canada.

Update July 31, 2017: Here are my IFORS slides
Update June 08, 2018: In response to an audience question at my IFORS 2017 session, I have now demonstrated that there is an upper bound for the error in the morphing approximation. See footnotes below.
Update August 18, 2020: This new paper has the complete exact solution method that I had been seeking all along.

This year is the centenary of A. K. Erlang's paper [1] on the determination of waiting times in an M/D/m queue with $m$ telephone lines.^* Today, M/M/m queues are used to model such systems as, call centers [3], multicore computers [4,5] and the Internet [6,7]. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation^† to the exact M/M/m queue [3] that is accurate to within 10% for typical applications^‡. The morphing formula for the residence-time, $R(m,\rho)$, is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:

The morphing solutions correspond to the $m$-roots of unity in the complex $z$-plane.
The exact solutions can be expressed as a rational function, $R(m,z)$.
The poles of $R(m,z)$ lie inside the unit disk, $|z| < 1$, and converge around the Szego; curve [8] as $m$ is increased.
The correction factor for the morphing model is defined by the deflated polynomial belonging to $R(m,z)$.
The pattern of poles in the $z$-plane provides a convenient visualization of how the morphing solutions differ from the exact solutions.

* Originally, Erlang assumed the call holding time, or mean service time $S$, was deterministic with unit period, $S=1$ [1,2]. The generalization to exponentially distributed service periods came later. Ironically, the exponential case is easier to solve than the apparently simpler deterministic case. That's why the M/D/1 queue is never the first example discussed in queueing theory textbooks.
† The derivation of the morphing model is presented in Section 2.6.6 of the 2005 edition of [4], although the word "morphing" is not used there. The reason is, I didn't know how to produce the exact result from it, and emphasizing it would likely have drawn unwarranted attention from Springer-Verlag editors. By the time I was writing the 2011 edition of [4], I was certain the approximate formula did reflect the morphing concept in its own right, even though I still didn't know how to connect it to the exact result. Hence, the verb "morphs" timidly makes its first and only appearance in the boxed text following equation 4.61.
‡ The relative error peaks at 10% for $m \sim 20$ and $\rho \sim 90\%$, then peaks again at 20% for $m \sim 2000$ and the servers running 99.9% busy. However, the rate of increase in peak error attenuates such that the maximum error is less than 25%, even as $m \rightarrow \infty$ and $\rho \rightarrow 100\%$. A plot of the corresponding curves gives a clearer picture. This behavior is not at all obvious. Prior to this result, it could have been that the relative error climbed to 100% with increasing $m$ and $\rho$.

References

A. K. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917.
A. K. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, vol 20, 1909.
E. Chromy, T. Misuth, M. Kavacky, "Erlang C Formula and Its Use In The Call Centers," Advances in Electrical and Electronic Engineering, Vol. 9, No. 1, March 2011.
N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer-Verlag, 2005 and 2011.
N. J. Gunther, S. Subramanyam, and S. Parvu, "A Methodology for Optimizing Multithreaded System Performance on Multicore Platforms," in Programming Multicore and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing, February 2017.
N. J. Gunther, "Numerical Investigations of Physical Power-law Models of Internet Traffic Using the Renormalization Group," IFORS 2005, Honolulu, Hawaii, July 11—15.
T. Bonald, J. W. Roberts, "Internet and the Erlang formula," ACM SIGCOMM Computer Communication Review, Volume 42, Number 1, January 2012.
C. Diaz Mendoza and R. Orive, "The Szegő curve and Laguerre polynomials with large negative parameters," Journal of Mathematical Analysis and Applications, Volume 379, Issue 1, Pages 305—315, 1 July 2011.

Saturday, October 8, 2016

Crib Sheet for Emulating Web Traffic

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools (PDF) is now available online and will be presented at the upcoming CMG conference in November.

Presenter: James Brady (co-author: Neil Gunther)
Session Number: 436
Subject Area: APM
Session Date: Wed, November 9, 2016
Session Time: 1:00 PM - 2:00 PM
Session Room: PortofinoB

PDQ as a Performance Periscope

This is a guest post by performance modeling enthusiast, Mohit Chawla, who brought the following very interesting example to my attention. In contrast to many of the examples in my Perl::PDQ book, these performance data come from a production web server, not a test rig. —NJG

Performance analysts usually build performance models based on their understanding of the software application's behavior. However, this post describes how a PDQ model acted like a periscope and also turned out to be pedagogical by uncovering otherwise hidden details about the application's inner workings and performance characteristics.

Some Background

Erlang Redux Resolved! (This time for real)

As I show in my Perl::PDQ book, the residence time at an M/M/1 queue is trivial to derive and (unlike most queueing theory texts) does not require any probability theory arguments. Great for Guerrillas! However, by simply adding another server (i.e., M/M/2), that same Guerrilla approach falls apart. This situation has always bothered me profoundly and on several occasions I thought I saw how to get to the exact formula—the Erlang C formula—Guerrilla style. But, on later review, I always found something wrong.

Although I've certainly had correct pieces of the puzzle, at various times, I could never get everything to fit in a completely consistent way. No matter how creative I got, I always found a fly in the ointment. The best I had been able to come up with is what I call the "morphing model" approximation where you start out with $m$ parallel queues at low loads and it morphs into a single $m$-times faster M/M/1 queue at high loads.

That model is also exact for $m = 2$ servers—which is some kind of progress, but not much. Consequently, despite a few misplaced enthusiastic announcements in the past, I've never been able to publish the fully corrected morphing model.

Hockey Elbow and Other Response Time Injuries

You've heard of tennis elbow. Well, there's a non-sports, performance injury that I like to call hockey elbow. An example of such an "injury" is shown in Figure 1, which appeared in a recent computer performance analysis presentation. It's a reminder of how easy it is to become complacent when doing performance analysis and possibly end up reaching the wrong conclusion.

Figure 1. injured response time performance

Figure 1 is seriously flawed for two reasons:

It incorrectly shows the response time curve with a vertical asymptote.
It compounds the first error by employing a logarithmic x-axis.

Performance of N+1 Redundancy

How can you determine the performance impact on SLAs after an N+1 redundant hosting configuration fails over? This question came up in the Guerrilla Capacity Planning class, this week. It can be addressed by referring to a multi-server queueing model.

N+1 = 4 Redundancy

We begin by considering a small-N configuration of four hosts where the load is distributed equally to each of the hosts. For simplicity, the load distribution is assumed to be performed by some kind of load balancer with a buffer. The idea of N+1 redundancy is that the load balancer ensures all four hosts are equally utilized prior to any failover.

The idea is that none of the hosts should use more than 75% of their available capacity: the blue areas on the left side of Fig. 1. The total consumed capacity is assumed to be $4 \times 3/4 = 3$ or 300% of the total host configuration (rather than all 4 hosts or 400% capacity). Then, when any single host fails, its lost capacity is compensated by redistributing that same load across the remaining three available hosts (each running 100% busy after failover). As we shall show in the next section, this is a misconception.

Figure 1. N+1 redundant hosts before (left) and after failover (right)

The circles in Fig. 1 represent hosts and rectangles represent incoming requests buffered at the load-balancer. The blue area in the circles signifies the available capacity of a host, whereas white signifies unavailable capacity. When one of the hosts fails, its load must be redistributed across the remaining three hosts. What Fig. 1 doesn't show is the performance impact of this capacity redistribution.

PDQ 6.0 is On Its Way

PDQ (Pretty Damn Quick) version 6.0.β is in the QA pipeline. Although this is a major release, cosmetically, things won't look any different when it comes to writing PDQ models. All the big changes have taken place under the hood in order to make PDQ more consistent with the R statistical environment.


R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

> library(pdq)
> source("/Users/njg/PDQ/Test Suites/R-Test/mm1.r")
                ***************************************
                ****** Pretty Damn Quick REPORT *******
                ***************************************
                ***  of : Thu Nov  8 17:42:48 2012  ***
                ***  for: M/M/1 Test                ***
                ***  Ver: PDQ Analyzer 6.0b 041112  ***
                ***************************************
                ***************************************
...

The main trick is that the Perl and Python versions of PDQ will remain entirely unchanged while at the same time invisibly incorporating significant changes to accommodate R.

Queues in the News

You might not have noticed, but queues have been in the news a lot lately. Not from the standpoint of computer performance but people performance or more accurately, crowd control. Most recently, queues popped up in the context of long delays expected at Heathrow airport due to big crowd arrivals for the London Olympics.

Fears over Heathrow queues during Olympics
The Waiting Game by The Numbers Guy
QANTAS to Get No Queues

In a queue, at least everyone is pointing in the same logical direction. Moreover, if you snake the queue, as they do at Disneyland, people always feel close to the destination and can see it getting even closer as customers ahead of them are processed. That helps to minimize their level of frustration: the control part. For some reason, the post office hasn't figured this out yet.

And, last but not least, queues and computer performance still remain an inevitable perennial. Most recently having to do with the Internet.

Load Testing with Uniform vs. Exponential Arrivals

In a couple of recent blog posts about generating exponential loads and why that is important for load testing and performance testing, it was not completely clear to some readers what was motivating my remarks. In this post, I will try to provide a more visual elaboration of that aspect.

My fundamental point is this. When it comes to load testing^*, presumably the idea is to exercise the system under test (SUT). Otherwise, why are you doing it? Part of exercising the SUT is to produce significant fluctuations in the number of requests residing in application buffers. Those fluctuations can be induced by the pattern of arriving requests issued by the client-side driver (DVR): usually implemented as a pile of PCs or blades.

My Year in Review 2011

Some days I wonder if I ever actually accomplish anything anymore. Maybe it's time to just pack it in and become a greeter at Walmart. I know a bit about how queues work, so that should put me a few notches ahead of the competition. And I would expect the competition to be fierce because it's a pretty cushy job; but not every day, apparently.

Before taking the big leap, I decided it might be a good idea to note down some of the technical projects I've worked on this year (over and above the daily grind):

The Multiserver Numbers Game

In a previous post, I explained why \begin{equation} R_m \neq \dfrac{R_1}{m} \label{eqn:badest} \end{equation} and therefore doesn't work as an estimator of the mean residence time in an M/M/m multi-server queue. Although we expect the extra server capacity with m-servers to produce a shorter residence time ($R_m$), it is not m-times smaller than the residence time ($R_1$) for a single-server (i.e., $m = 1$) queue.

M/M/m multiserver queue

The problem is that eqn. \eqref{eqn:badest} grossly underestimates $R_m$, which is precisely the wrong direction for capacity planning scenarios. For that purpose, it's generally better to overestimate performance metrics. That's too bad because it would be a handy Guerrilla-style formula if it did work. You would be able do the calculation in your head and impress everyone on your team (not to mention performing it as a party trick).

Given that eqn. \eqref{eqn:badest} is a poor estimator, you might wonder if there's a better one, and if you'd been working for Thomas Edison he would have told you: "There's a better wsy. Find it!" Easy for him to say. But if you did decide to take up Edison's challenge, how would you even begin to search for such a thing?

Little's Lore

Guerrilla alumnus Paul P. has a penchant for sending me interesting things, and recently he sent me a piece on Little's law. Remarkably, it wasn't just another proof of L = λW, but a brief retrospective written by none other than John Little himself. I quote it here because it not only provides some unusual insight into how these things get done, but it is written in a charming and self-effacing style.

Bye Bye Mr. Bar Code

Last week, there was a New York Times obituary for Alan Haberman, he being the person who ushered in the barcode. Notice that's usher and not invent. That distinction goes to Norman Woodland and Bernard Silver, two graduate students at the Drexel Institute of Technology (now Drexel University), and was based—perhaps not surprisingly—on Morse code, which is now defunct.

Queueing at a grocery checkout [Source: Perl PDQ 2nd edn]
The bar code was an effort to modernize the grocery industry, which dates back to the 1940s. Woodland and Silver received a patent in 1952, but because scanning technology was rather poor at that time, their invention went largely unused. And that's where Alan Haberman comes in because he championed its adoption in actual retail stores. The first product to be purchased using a barcode, chewing gum no less, took place in 1974 at Marsh Supermarket in Troy, Ohio,

All of which brings me to the point of this post. Not only do I tend to use the grocery store as a familiar example of queueing effects, both in my Guerrilla CaP classes and my Perl::PDQ book, but Jim Holtman, one of our GDAT instructors, is currently doing data analysis and simulations for Kroger Supermarket in Cincinnati, Ohio. What is it with Ohio and grocery stores?

What started with barcodes, continues today with the application of RFID, motion capture, shelf optimization and so forth. And all these performance improvements rely on analyzing big data sets. No doubt, Jim will recount some of this in the upcoming GDAT class. You could do worse than be there for that. You can even bring your own data to be scanned and we'll check it out for you. :)

Tuesday, June 14, 2011

Two Heads Are Better Than One ... And m

In the GCaP class, one of the homework exercises refers to a grocery store checkout where the customer arrival rate is 1 customer every 2 minutes and the mean service time for the cashier to ring up groceries is 1.5 minutes. The first question is: What is the mean residence time for each customer?

From Little's law, the utilization of the cashier is ρ = λ S = 0.5 × 1.5 = 0.75 or 75%. The residence time is given by R₁ = S/(1 − ρ) = 6.0 minutes.

Efficient Elevators: Algorithms, Cars and Queues

The latest PBS NOVA episode entitled "Trapped in an Elevator" is based on an actual event that occurred in 1999. Watching it reminded me that elevators (or lift in British english) can be regarded as a queueing system, viz., priority queues, which are also the basis for scheduling algorithms in operating systems and storage devices. A lot of this background can be found in Don Knuth's erudite volumes:

Vol 1, p.280: elevator simulator program based on doubly-linked lists
Vol 3, p.150: elevator scheduling as priority queues
Vol 3, p.357: tape sorting reformulated as single elevator problem
Vol 3, p.374: disk seeks treated as single elevator problem

[Best wishes for Randall's fiancée]

Calculating the Cost of Elastic Capacity

Neal Richter sent me the following tweet

Unfortunately, the paper ("Optimal staffing policy for queuing systems with cyclic demands," Int. J. Services and Operations Management, 2010) cited in the PhyOrg news item, that Neal tweeted, is not accessible to either him or me. Nonetheless, I found an earlier paper (2007) by the same author (Pen-Yuan Liao), which has a lot of the same words so, I'm assuming they both describe the same thing, more or less. Either way, I'm quite certain the math is the same.

Simulating a Queue in R

In the GCaP class earlier this month, we talked about the meaning of the load average (in Unix and Linux) and simulating a grocery store checkout lane, but I didn't actually do it. So, I decided to take a shot at constructing a discrete-event simulation (as opposed to Monte Carlo simulation) of a simple M/M/1 queue in R.

Jackson's Theorem for the Cloud

Queueing theory, as a distinct discipline, just turned 100 last year. Compared with mathematics and physics, it's a relative youngster. Some seminal results include: Erlang's original solution for the M/D/1 queue (1909), his solutions for a multiserver queue without a waiting line M/M/m/m and with a waiting line M/M/m/∞; AKA "call waiting" (1917), the Pollaczek–Khinchine formula for the M/G/1 queue (1930) and Little's proof (1961). These results were established in the context of individual queueing facilities.

Wednesday, June 20, 2018

Saturday, April 21, 2018

Exposing the Cost of PerformanceHidden in the Cloud

Wednesday, March 15, 2017

References

Saturday, October 8, 2016

Wednesday, August 3, 2016

Some Background

Thursday, July 28, 2016

Wednesday, July 29, 2015

Thursday, March 13, 2014

N+1 = 4 Redundancy

Monday, November 12, 2012

Wednesday, May 23, 2012

Monday, May 14, 2012

Sunday, January 1, 2012

Sunday, July 10, 2011

Saturday, July 2, 2011

Monday, June 20, 2011

Tuesday, June 14, 2011

Monday, November 8, 2010

Sunday, June 6, 2010

Sunday, May 30, 2010

Saturday, May 22, 2010

Exposing the Cost of Performance
Hidden in the Cloud