The Pith of Performance: performance models

Showing posts with label performance models. Show all posts

Thursday, May 8, 2025

How to Quantify Scalability

How to Quantify Scalability: Synopsis of The Universal Scalability Law (USL), has just been posted in PDF format for improved readability. I did this post haste so, reporting any typos, broken links, etc., is always appreciated.

Tuesday, May 6, 2025

Performance Ponderings Updated

Just updated my "Performance Ponderings" page. It covers my performance analysis papers published over the past 35 yrs. 😳 You may be thinking: Who cares about performance analysis cases that is that dated?

Amazingly, performance analysis (done right) is often not technology-dependent, per se. The right abstraction can remain invariant into perpetuity. For example, my 2025 analysis of the GPT Efficient Computer Frontier (ref. 8 in paper 1) is based partly on my 1989 analysis of virtual memory thrashing in paper 53. Same paradigm; vastly different context.

The connection between the two papers is by no means obvious but, quite striking (not to mention satisfying) when you realize it.

Tuesday, April 20, 2021

PDQ Online Workshop, May 17-21, 2021

PDQ (Pretty Damn Quick) is a free, open source, performance analyzer available from the Performance Dynamics web site.

All modern computer systems, no matter how complex, can be thought of as a directed graph of individual buffers that hold requests until to be serviced at a shared computational resource, e.g., a CPU or disk. Since a buffer is just a queue, any computer infrastructure, from your laptop up to Facebook.com, can be represented as a directed graph of queues.

The directed arcs or arrows in such a graph correspond to workflows between different queues. In the parlance of queueing theory, a directed graph of queues is called a queueing network model. PDQ is a tool for predicting performance metrics such as, waiting time, throughput, optimal user-load.

Two major benefits of using PDQ are:

confirmation that monitored performance metrics have their expected values
predict performance for circumstances that lie beyond current measurements

Find out more about the workshop and register today.

Thursday, November 26, 2020

PDQ 7.0 is Not a Turkey

Giving Thanks for the release of PDQ 7.0, after a 5-year drought, and just in time for the PDQW workshop next week.

New Featues

The introduction of the STREAMING solution method for OPEN queueing networks. (cf. CANON, which can still be used).
The CreateMultiNode() function is now defined for CLOSED queueing networks and distinguished via the MSC device type (cf. MSO for OPEN networks).
The format of Report() has been modified to make the various types of queueing network parameters clearer.
See the R Help pages in RStudio for details.
Run the demo(package="pdq") command in the R console to review a variety of PDQ 7 models.

Maintenance Changes

The migration of Python from 2 to 3 has introduced maintenance complications for PDQ. Python 3 may eventually be accommodated in future PDQ releases. Perl maintenance has ended with PDQ release 6.2, which to remain compatible with the Perl::PDQ book (2011).

Wednesday, January 2, 2019

DSConf 2019 Featured Talk

"Applying The Universal Scalability Law to Distributed Systems"

DSConf'19 - Distributed Systems Conference (scroll down)
Pune, India
11am IST
February 16

I'm very much looking forward to this event and I thank @ShrivedAgashe for the invitation.

Monday, June 25, 2018

Guerrilla 2018 Classes Now Open

All Guerrilla training classes are now open for registration.

GCAP: Guerrilla Capacity and Performance — From Counters to Containers and Clouds
GDAT: Guerrilla Data Analytics — Everything from Linear Regression to Machine Learning
PDQW: Pretty Damn Quick Workshop — Personal tuition for performance and capacity mgmt

The following highlights indicate the kind of thing you'll learn. Most especially, how to make better use of all that monitoring and load-testing data you keep collecting.

How to save millions of dollars with a one-line performance model (video)
How to minimize chargeback after you lift and shift to the cloud (video)
How to correctly emulate web traffic on a load-testing rig (PDF)

See what Guerrilla grads are saying about these classes. And how many instructors do you know that are available for you from 9am to 9pm (or later) each day of your class?

Who should attend?

IT architects
Application developers
Performance engineers
Sysadmins (Linux, Unix, Windows)
System engineers
Test engineers
Mainframe sysops (IBM. Hitachi, Fujitsu, Unisys)
Database admins
Devops practitioners
SRE engineers
Anyone interested in getting beyond performance monitoring

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. The room-booking link is on the registration page.

Tell a colleague and see you in September!

Wednesday, June 20, 2018

Chargeback in the Cloud - The Movie

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

Sunday, May 20, 2018

USL Scalability Modeling with Three Parameters

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.

The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:

providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
satisfying the von Neumann criterion: minimal number of modeling parameters

Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better.

The Geometry of Latency

... AKA hyperbolae.

Here's a mnemonic tabulation based on dishes and bowls:

Hopefully this makes amends for the more complicated explanation I wrote for CMG back in 2009 entitled: "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ", which I'm pretty sure almost nobody understood.

Saturday, April 21, 2018

Virtual cloudXchange 2018 Conference

Our abstract has been accepted for presentation at the FREE cloudXchange online event to be held by CMG on June 19th at 10am Pacific (5pm UTC). [Extended slides]

Exposing the Cost of Performance
Hidden in the Cloud

Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany
10am Pacific Time on June 19, 2018

Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.

Wednesday, March 14, 2018

WTF is Modeling, Anyway?

A conversation with performance and capacity management veteran Boris Zibitsker, on his BEZnext channel, about how to save multiple millions of dollars with a one-line performance model (at 21:50 minutes into the video) that has less than 5% error. I wish my PDQ models were that good. :/

The strength of the model turns out to be its explanatory power, rather than prediction, per se. However, with the correct explanation of the performance problem in hand (which also proved that all other guesses were wrong), this model correctly predicted a 300% reduction in application response time for essentially no cost. Modeling doesn't get much better than this.

Footnotes

According to Computer World in 1999, a 32-node IBM SP2 cost $2 million to lease over 3 years. This SP2 cluster was about 6 times bigger.
Because of my vain attempt to suppress details (in the interests of video length), Boris gets confused about the kind of files that are causing the performance problem (near 26:30 minutes). They're not regular data files and they're not executable files. The executable is already running but sometimes waits—for a long time. The question is, waits for what? They are, in fact, special font files that are requested by the X-windows application (the client, in X parlance). These remote files may also get cached, so it's complicated. In my GCAP class, I have more time to go into this level of detail. Despite all these potential complications, my 'log model' accurately predicts the mean application launch time.
Log_2 assumes a binary tree organization of font files whereas, Log_10 assumes a denary tree.
Question for the astute viewer. Since these geophysics applications were all developed in-house, how come the developers never saw the performance problems before they ever got into production? Here's a hint.
Some ppl have asked why there's no video of me. This was the first time Boris had recorded video of a Skype session and he pushed the wrong button (or something). It's prolly better this way. :P

Wednesday, March 15, 2017

Morphing M/M/m: A New View of an Old Queue

The following abstract has been accepted for presentation at the 21st Conference of the International Federation of Operational Research Societies — IFORS 2017, Quebec City, Canada.

Update July 31, 2017: Here are my IFORS slides
Update June 08, 2018: In response to an audience question at my IFORS 2017 session, I have now demonstrated that there is an upper bound for the error in the morphing approximation. See footnotes below.
Update August 18, 2020: This new paper has the complete exact solution method that I had been seeking all along.

This year is the centenary of A. K. Erlang's paper [1] on the determination of waiting times in an M/D/m queue with $m$ telephone lines.^* Today, M/M/m queues are used to model such systems as, call centers [3], multicore computers [4,5] and the Internet [6,7]. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation^† to the exact M/M/m queue [3] that is accurate to within 10% for typical applications^‡. The morphing formula for the residence-time, $R(m,\rho)$, is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:

The morphing solutions correspond to the $m$-roots of unity in the complex $z$-plane.
The exact solutions can be expressed as a rational function, $R(m,z)$.
The poles of $R(m,z)$ lie inside the unit disk, $|z| < 1$, and converge around the Szego; curve [8] as $m$ is increased.
The correction factor for the morphing model is defined by the deflated polynomial belonging to $R(m,z)$.
The pattern of poles in the $z$-plane provides a convenient visualization of how the morphing solutions differ from the exact solutions.

* Originally, Erlang assumed the call holding time, or mean service time $S$, was deterministic with unit period, $S=1$ [1,2]. The generalization to exponentially distributed service periods came later. Ironically, the exponential case is easier to solve than the apparently simpler deterministic case. That's why the M/D/1 queue is never the first example discussed in queueing theory textbooks.
† The derivation of the morphing model is presented in Section 2.6.6 of the 2005 edition of [4], although the word "morphing" is not used there. The reason is, I didn't know how to produce the exact result from it, and emphasizing it would likely have drawn unwarranted attention from Springer-Verlag editors. By the time I was writing the 2011 edition of [4], I was certain the approximate formula did reflect the morphing concept in its own right, even though I still didn't know how to connect it to the exact result. Hence, the verb "morphs" timidly makes its first and only appearance in the boxed text following equation 4.61.
‡ The relative error peaks at 10% for $m \sim 20$ and $\rho \sim 90\%$, then peaks again at 20% for $m \sim 2000$ and the servers running 99.9% busy. However, the rate of increase in peak error attenuates such that the maximum error is less than 25%, even as $m \rightarrow \infty$ and $\rho \rightarrow 100\%$. A plot of the corresponding curves gives a clearer picture. This behavior is not at all obvious. Prior to this result, it could have been that the relative error climbed to 100% with increasing $m$ and $\rho$.

References

A. K. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917.
A. K. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, vol 20, 1909.
E. Chromy, T. Misuth, M. Kavacky, "Erlang C Formula and Its Use In The Call Centers," Advances in Electrical and Electronic Engineering, Vol. 9, No. 1, March 2011.
N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer-Verlag, 2005 and 2011.
N. J. Gunther, S. Subramanyam, and S. Parvu, "A Methodology for Optimizing Multithreaded System Performance on Multicore Platforms," in Programming Multicore and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing, February 2017.
N. J. Gunther, "Numerical Investigations of Physical Power-law Models of Internet Traffic Using the Renormalization Group," IFORS 2005, Honolulu, Hawaii, July 11—15.
T. Bonald, J. W. Roberts, "Internet and the Erlang formula," ACM SIGCOMM Computer Communication Review, Volume 42, Number 1, January 2012.
C. Diaz Mendoza and R. Orive, "The Szegő curve and Laguerre polynomials with large negative parameters," Journal of Mathematical Analysis and Applications, Volume 379, Issue 1, Pages 305—315, 1 July 2011.

Tuesday, January 17, 2017

GitHub Growth Appears Scale Free

Update of Thursday, August 17, 2017: It's looks like we can chalk up another one for the scale-free model (described below) as Github apparently surpasses 20 million users. Outgoing CEO Wanstrath mentioned this number in an emailed statement to Business Insider.

"As GitHub approaches 700 employees, with more than $200M in ARR, accelerating growth, and more than 20 million registered users, I'm confident that this is the moment to find a new CEO to lead us into the next stage of growth. ....."

The Original Analysis

In 2013, a Redmonk blogger claimed that the growth of GitHub (GH) users follows a certain type of diffusion model called Bass diffusion. Here, growth refers to the number of unique user IDs as a function of time, not the number project repositories, which can have a high degree of multiplicity.

In a response, I tweeted a plot that suggested GH growth might be following a power law, aka scale free growth. The tell-tale sign is the asymptotic linearity of the growth data on double-log axes, which the original blog post did not discuss. The periods on the x-axis correspond to years, with the first period representing calendar year 2008 and the fifth period being the year 2012.

A Clue for Remembering Little's Laws

During the Guerrilla Data Analysis class last week, alumnus Jeff P. came up with this novel mnemonic device for remembering all three forms of Little's law that relate various queueing metrics to the mean arrival rate $\lambda$.

The right-hand side of each equation representing a version of Little's law is written vertically in the order $R, W, S$, which matches the expression $R=W+S$ for the mean residence time, viz., the sum of the mean waiting time ($W$) and the mean service time ($S$).

The letters on the left-hand side: $Q, L, U$ (reading vertically) respectively correspond to the queueing metrics: queue-length, waiting-line length, and utilization, which can be read as the word clue.

Incidentally, the middle formula is the version that appears in the title of John Little's original paper:

J. D. C. Little, ``A Proof for the Queuing Formula: L = λ W,''
Operations Research, 9 (3): 383—387 (1961)

Wednesday, August 3, 2016

PDQ as a Performance Periscope

This is a guest post by performance modeling enthusiast, Mohit Chawla, who brought the following very interesting example to my attention. In contrast to many of the examples in my Perl::PDQ book, these performance data come from a production web server, not a test rig. —NJG

Performance analysts usually build performance models based on their understanding of the software application's behavior. However, this post describes how a PDQ model acted like a periscope and also turned out to be pedagogical by uncovering otherwise hidden details about the application's inner workings and performance characteristics.

Some Background

Erlang Redux Resolved! (This time for real)

As I show in my Perl::PDQ book, the residence time at an M/M/1 queue is trivial to derive and (unlike most queueing theory texts) does not require any probability theory arguments. Great for Guerrillas! However, by simply adding another server (i.e., M/M/2), that same Guerrilla approach falls apart. This situation has always bothered me profoundly and on several occasions I thought I saw how to get to the exact formula—the Erlang C formula—Guerrilla style. But, on later review, I always found something wrong.

Although I've certainly had correct pieces of the puzzle, at various times, I could never get everything to fit in a completely consistent way. No matter how creative I got, I always found a fly in the ointment. The best I had been able to come up with is what I call the "morphing model" approximation where you start out with $m$ parallel queues at low loads and it morphs into a single $m$-times faster M/M/1 queue at high loads.

That model is also exact for $m = 2$ servers—which is some kind of progress, but not much. Consequently, despite a few misplaced enthusiastic announcements in the past, I've never been able to publish the fully corrected morphing model.

How to Emulate Web Traffic Using Standard Load Testing Tools

The following abstract has been submitted to CMG 2016:

How to Emulate Web Traffic Using Standard Load Testing Tools
James Brady (State of Nevada) and Neil Gunther (Performance Dynamics)
Conventional load-testing tools are based on a fifty year old time-share computer paradigm where a finite number of users submit requests and respond in a synchronized fashion. Conversely, modern web traffic is essentially asynchronous and driven by an unknown number of users. This difference presents a conundrum for testing the performance of modern web applications. Even when the difference is recognized, performance engineers often introduce virtual-user script modifications based on hearsay; much of which leads to wrong results. We present a coherent methodology for emulating web traffic that can be applied to existing test tools.

Keywords: load testing, workload simulation, web applications, software performance engineering, performance modeling

Monday, August 24, 2015

PDQ Version 6.2.0 Released

PDQ (Pretty Damn Quick) is a FOSS performance analysis tool based on the paradigm of queueing models that can be programmed natively in

R
Python
Perl
C and several other languages.

This minor release is now available for download.

Continuous Integration Gets Performance Testing Radar

As companies embrace continuous integration (CI) and fast release cycles, a serious problem has emerged in the development pipeline: Conventional performance testing is the new bottleneck. Every load testing environment is actually a highly complex simulation assumed to be a model of the intended production environment. System performance testing is so complex that the cost of modifying test scripts and hardware has become a liability for meeting CI schedules. One reaction to this situation is to treat performance testing as a checkbox item, but that exposes the new application to unknown performance idiosyncrasies in production.

In this webinar, Neil Gunther (Performance Dynamics Company) and Sai Subramanian (Cognizant Technology Solutions) will present a new type of model that is not a simulation, but instead acts like continuous radar that warns developers of potential performance and scalability issues during the CI process. This radar model corresponds to a virtual testing framework that precludes the need for developing performance test scripts or setting up a separate load testing environment. Far from being a mere idea, radar methodology is based on a strong analytic foundation that will be demonstrated by examining a successful case study.

Broadcast Date and Time: Tuesday, July 22, 2014, at 11 am Pacific

Thursday, July 17, 2014

Restaurant Performance Sunk by Selfies

An interesting story appeared last weekend about a popular NYC restaurant realizing that, although the number of customers they served on a daily basis is about the same today as it was ten years ago, the overall service had slowed significantly. Naturally, this situation has led to poor online reviews to the point where the restaurant actually hired a firm to investigate the problem. The analysis of surveillance tapes led to a surprising conclusion. The unexpected culprit behind the slowdown was neither the kitchen staff nor the waiters, but customers taking photos and otherwise playing around with their smartphones.

Using the data supplied in the story, I wanted to see how the restaurant performance would look when expressed as a PDQ model. First, I created a summary data frame in R, based on the observed times:


> df
           obs.2004 obs.2014
wifi.data         0        5
menu.data         8        8
menu.pix          0       13
order.data        6        6
eat.mins         46       43
eat.pix           0       20
paymt.data        5        5
paymt.pix         0       15
total.mins       65      115

The 2004 situation can be represented schematically by the following queueing network

Figure 1. 2004 queueing stages

Referring to Figure 1: