The Pith of Performance

How to Quantify Scalability

2025-05-08T12:39:00.000-07:00

How to Quantify Scalability: Synopsis of The Universal Scalability Law (USL), has just been posted in PDF format for improved readability. I did this post haste so, reporting any typos, broken links, etc., is always appreciated.

Performance Ponderings Updated

2025-05-06T12:41:00.000-07:00

Just updated my "Performance Ponderings" page. It covers my performance analysis papers published over the past 35 yrs. 😳 You may be thinking: Who cares about performance analysis cases that is that dated?

Amazingly, performance analysis (done right) is often not technology-dependent, per se. The right abstraction can remain invariant into perpetuity. For example, my 2025 analysis of the GPT Efficient Computer Frontier (ref. 8 in paper 1) is based partly on my 1989 analysis of virtual memory thrashing in paper 53. Same paradigm; vastly different context.

The connection between the two papers is by no means obvious but, quite striking (not to mention satisfying) when you realize it.

Solving a Functional Equation with Functional Programming

2025-04-26T10:32:00.000-07:00

Functional equation question seen on Twitter:

This is my general solution, computed in Mathematica: a functional programming language. Scroll through the PDF for details.

The Sommerfeld-Dirac Paradox Reexamined

2025-02-14T10:09:00.000-08:00

This talk will be presented at the APS Global Physics Summit on Thursday, March 20, 2025 at 03:15 PM.

Does the Efficiency Compute Frontier Represent New Physics?

2025-02-14T10:04:00.000-08:00

This talk will be presented at the APS Global Physics Summit on Monday, March 17, 2025 at 11:30 AM.

Resolving the Dirac-Sommerfeld Paradox via the Path to Quantum Computers

2024-07-23T10:23:00.000-07:00

Upcoming online presentation.

This topic goes back to my M.Sc. thesis, when my supervisor, Prof. Christie Eliezer (2nd from left in the photo), who was one of Dirac's very few research students, told me about the Dirac-Sommerfeld paradox. He thought it might be pertinent to my thesis.

Being a physics "genius", I didn't pursue it because I assumed it had been resolved by more modern developments like, Quantum Field Theory — a subject that neither Eliezer nor I knew much about. I was wrong. Very wrong!

Extended Microbenchmarks for Modern Disk Drives

2024-06-24T08:18:00.000-07:00

I learnt about this interesting blog post entitled, "Discovering Hard Disk Physical Geometry through Microbenchmarking" by Henry Wong (2019) via @TanelPoder on Twitter. As the title suggests, it is about applying so-called disk drive microbenchmarking to reveal information about the internal mechanical configuration and data layout that is not typically available with other HDD microbenchmark codes; many of which are cited by the author in the References section (something most bloggers are too lazy to do).

While reading it I had some thoughts that turned out to be too long for tweeting a response. So, here I am revisiting my traditional blog, mostly for the formatted space it provides, but also to remind myself that it still lives!

The Article

The article contains a massive amount of very admirable work (~1 year of effort, including benchmark code development) that is exceptionally well written, considering all the details that are covered. I didn't come across any typos or misspellings, either. There are actually TWO articles in one: an extensive summary analysis (not an Executive Summary) and an equally long Appendix containing the detailed measurements obtained from each disk model that was benchmarked.

Quite apart from my subsequent remarks below, the inferences about internal disk structure drawn from the microbenched timing-data are quite remarkable. And the beautiful plots generated from those data (e.g., mapping thousands of sector defects) are something to behold. Understandably, HDD manufacturers are not keen to include such plots in their marketing collateral.

The author states the following as his prime motivations for developing the extended microbenchmark algorithms.

"I had initially set out to detect the number of platters in a disk without opening up a disk, but in modern disks this simple-sounding task requires measuring several other properties before inferring the count of recording surfaces."
"Many of the prior works had practical applications in mind (as opposed to pure curiosity), and were willing to trade measurement speed for a small loss in accuracy. Although [my microbenchmark] algorithms are not fundamentally different, prior work likely avoided exploring these because these measurements can take many hours and there is little practical benefit for applications such as performance modelling or improving data layouts in filesystems."

Although a truly laudible effort, that is largely driven by scientific "curiosity", I can't help but wonder if the author is straining at a gnat.

With a bit of perseverance, I could find several corresponding drive specs online. And, although manufactuers never lie (smirk) the search terms can vary widely.

Platters = disks = discs...
Surfaces = heads = actuators...

Applying the author's terminology:

The number of discrete platters has to be an integer.
The number of surfaces is an even integer (each platter having two sides).

But, that's not what you see in the Summary table of the 17 HDD models that he tested. Let's look at a few examples.

Example Geometries

The following disk models were compared against their respective manufacturer specs that I was able to find online.

Tosh X300 5GB has 5 platters and therefore 10 surfaces (1 Gig per platter).
Seagate ST1 5GB has 1 platter and therefore 2 surfaces.
WD S25, however, has 2 platters but only 3 surfaces. How can that be?

Western Digital built this disk with only 3 actuator arms, not the expected 4. The fourth surface isn't used. Only WD knows why. Cost cutting is one plausible explanation. There is no way to anticpate such quirks. You either have to measure it or read about it.

The author infers the same number of WD S25 used surfaces (3) based on his benching data which, in itself, is quite remarkable. On the other hand, this same information is generally included in the disk manufacturer specs (or similar docs). Worst case, you could actually contact the manufacturer and they would very likely tell you; especially if they thought they could sell you some disks.

Bitter Pill

However, even when you determine all these interesting extended disk metrics (measured, calculated or specified), you are no better off when it comes to using that information for better storage performance — as the author admits (see quote above) — or predicting future HDD internal geometries. Just like with modern multicore microprocessors,

control (especially that related to performance) has progressively been taken away from you and placed inside a silicon module. You can't blue-wire that. Similarly for HDD packaging.
manufacturers tend to implement the cheapest solutions not the best performing solutions, which you can't correct. See WD S25 above. Similarly for HDD controller logic.

The HDD microbenching measurements described in the 2019 article would have to be repeated on each new model release. Any mechanical tweaks or other changes inside the new disk would have to be discovered and then incorporated into updated microbenchmark code ... if possible.

Computer manufacturers of all stripes are in business to make a profit and to avoid being eaten by the competition. They will do whatever it takes (good, bad or ugly), whenever it's deemed necessary, to maintain or improve their market position. An unfortunate casuality of this ongoing commercial warfare is that any painstakingly acquired scientific analysis can become an historical footnote overnight.

Performance Modeling

On the topic of performance modeling of disks (to which the author also alludes), his article is a good reminder of the mind-boggling complexity of data layout in modern disks. And that's essentially just the static picture. Along with that goes the complex dynamics of local request-chain caching and optimization.

In that vein, it's a fool's errand to try and model individual modern disks. In principle, you would need to construct a simulation model to handle all the potential transient behaviors. But that, in turn, requires deciphering the embedded controller algorithms, which are not only highly complex but proprietary. You'll never figure them out and no manufacturer will ever reveal them. So, a performance simulation is actually a non-starter.

On a more positive note, there is a surprisingly counterintuitive technique for constructing accurate performance models of collective disks, like SANs, using the PDQ (Pretty Damn Quick) performance analyzer. That's something I demonstrate in my Guerrilla Capacity and Performance (GCAP) classes.

Henry Wong's microbenchmark code written in C++
ATTO Disk Benchmark (2019) — not cited by Henry Wong but I saw it used in other disk microbenchmark reports.
The SSD World Will End in 2024
Green Disk Sizing
Disk Storage Myth Busters
All competitive benchmarking is institutionalized cheating (Gaphorism 1.21)

PDQ Online Workshop, May 17-21, 2021

2021-04-20T09:14:00.002-07:00

PDQ (Pretty Damn Quick) is a free, open source, performance analyzer available from the Performance Dynamics web site.

All modern computer systems, no matter how complex, can be thought of as a directed graph of individual buffers that hold requests until to be serviced at a shared computational resource, e.g., a CPU or disk. Since a buffer is just a queue, any computer infrastructure, from your laptop up to Facebook.com, can be represented as a directed graph of queues.

The directed arcs or arrows in such a graph correspond to workflows between different queues. In the parlance of queueing theory, a directed graph of queues is called a queueing network model. PDQ is a tool for predicting performance metrics such as, waiting time, throughput, optimal user-load.

Two major benefits of using PDQ are:

confirmation that monitored performance metrics have their expected values
predict performance for circumstances that lie beyond current measurements

Find out more about the workshop and register today.

PDQ 7.0 is Not a Turkey

2020-11-26T05:21:00.000-08:00

Giving Thanks for the release of PDQ 7.0, after a 5-year drought, and just in time for the PDQW workshop next week.

New Featues

The introduction of the STREAMING solution method for OPEN queueing networks. (cf. CANON, which can still be used).
The CreateMultiNode() function is now defined for CLOSED queueing networks and distinguished via the MSC device type (cf. MSO for OPEN networks).
The format of Report() has been modified to make the various types of queueing network parameters clearer.
See the R Help pages in RStudio for details.
Run the demo(package="pdq") command in the R console to review a variety of PDQ 7 models.

Maintenance Changes

The migration of Python from 2 to 3 has introduced maintenance complications for PDQ. Python 3 may eventually be accommodated in future PDQ releases. Perl maintenance has ended with PDQ release 6.2, which to remain compatible with the Perl::PDQ book (2011).

PDQ Online Workshop, Nov 30-Dec 5, 2020

2020-11-08T10:07:00.001-08:00

PDQ (Pretty Damn Quick) is a queueing graph performance analyzer that comes as:

free open source software package
with an online user manual

As shown in the above diagram, any modern computer system can be thought of as a directed graph of individual buffers where requests wait to be serviced at some kind of shared computational resource, e.g., a CPU or disk. Since a buffer is just a queue, any computer infrastructure, from a laptop to Facebook, can be represented as a directed graph of queues. The directed arcs or arrows correspond to workflows between the different queues. In the parlance of queueing theory, a directed graph of queues is called a queueing network model. PDQ is a tool for calculating the performance metrics, e.g., waiting time, throughput, optimal load, of such network models.

Some example PQD applications include models of:

Cloud applications
Packet networks
HTTP + VM + DB apps
PAXOS-type distributed apps

Two major benefits of using PDQ are:

confirmation that monitored performance metrics have their expected values
predict performance for circumstances that lie beyond current load-testing

This particular PDQ workshop has been requested for the Central European Time zone.

All sessions are INTERACTIVE using Skype (not canned videos)
Online sessions are usually 4 hours in duration
A typical timespan is 2pm to 6pm CET each business day
A nominal 5-10 minute bio break at 4pm CET
Attendees are encouraged to bring there own PDQ projects or data to start one

Find out more about the workshop.
Here is the REGISTRATION page.

Hope to see you at PDQW!

Converting Between Human Time and Absolute Time in the Shell

2020-02-11T11:49:00.000-08:00

This is really more of a note-to-self but it may also be useful for other readers.

Converting between various time zones, including UTC time, is both a pain and error-prone. A better solution is to use absolute time. Thankfully, Unix provides such a time: the so-called Epoch time, which is the integer count of the number of seconds since January 1, 1970.

Timestamps are both very important and often overlooked: the moreso in the context of performance analysis, where time is the zeroth metric. In fact, my preferred title would have been, Death to Time Zones but that choice would probably have made it harder to find the main point here, later on, viz., how to use the Unix epoch time.

Although there are web pages that facilitate such time conversions, there are also shell commands that are often more convenient to use, but not so well known. With that in mind, here are some examples (for future reference) of how to convert between human-time and Unix time.

Examples

The following are the bash shell commands for both MacOS and Linux.

MacOS and BSD

Optionally see the human-readable date-time:


[njg]~% date
Tue Feb 11 10:04:32 PST 2020

Get the Unix integer time for the current date:


[njg]~% date +%s
1581444272

Resolve a Unix epoch integer to date-time:


[njg]~% date -r 1581444272
Tue Feb 11 10:04:32 PST 2020

Linux

Optionally see the human-readable date-time:


[njg]~% date
Tue Feb 11 10:04:32 PST 2020

Get the Unix integer time for the current date:


[njg]~% date +%s
1581444272

Resolve a Unix epoch integer to date-time:


[njg]~% date -d @1581444272
Tue Feb 11 18:04:32 UTC 2020

Book Review: How to Build a Performance Testing Stack From Scratch

2019-06-03T17:52:00.000-07:00

Writing a technical book is a difficult undertaking. As a technical author myself, I know that writing well is both arduous and tedious. There are no shortcuts. Over the last 40 years or so, computer-based tools have been developed to help authors write printed textbooks, monographs and technical articles. LaTeX (pronounced lar-tek) reigns supreme in that realm because it's not just word processing software but, a full-blown digital typesetting application that enables authors to produce a single camera-ready PDF file (bits) that is fit for direct printing. Not only does LaTeX correctly typeset characters and mathematical symbols but it can generate the Table of Contents and the corresponding Index of terms, together with correctly cross-referenced callouts for numbered chapters, sections, figures, tables and equations.

In the meantime, over the past 20 years, the nature of the book itself has become progressively more digital. The ability to render the book-block on digital devices as an "e-book" has made printed hardcopies optional. Although purely digital e-books make reading more ubiquitous, e-book file formats and display quality varies across devices, i.e., laptops, phones tablets, and various e-readers. Any reduction in display quality is offset by virtue of e-books being able to include user interaction and even animation: features entirely beyond the printed book. In that sense, there really has been a revolution in the publishing industry: books are no longer about books, they're now about media.

Recently, I became aware of an undergraduate calculus textbook that makes powerful use of animation and audio. The author is a academic mathematician and he published it on YouTube! Is that a book or a movie? Somehow, it's a hybrid of both and, indeed, I wish I'd been able to learn from technical "books" like that. Good visuals and animations can make difficult technical concepts much easier to comprehend. Progressive as all that is, when it comes to technical e-books, I'm not aware of any single authoring tool that can match the quality of LaTeX, let alone incorporate user-interaction and animation. If such a thing did exist, I would be all over it. And Markdown doesn't cut it. But, digital authoring tools are continually evolving.

Matt Fleming (@fleming_matt on Twitter), the author of How to Build a Performance Testing Stack From Scratch, opted to use a static e-book format—not because it produces the most readable result, but because it is the best way to reach a wider audience at lower cost than a more expensive print publisher. The e-publisher in this case is (the relatively unknown) Ministry of Testing Ltd in Brighton, UK and is available on Amazon for the Kindle reader. I was also able to read it using iBooks on Mac OS X.

The range of topics covered is very extensive. I've included the Table of Contents here because it is not viewable on Amazon:

Part 1

Step 1: Identify Stakeholders
Step 2: Identify What to Measure
Step 3: Test Design
Step 4: Measuring Test Success and Failure
Step 5: Sharing Results
Part 2

Understanding Statistics
Latency
Throughput
Statistical Significance
Part 3 The Benchmark Hierarchy Picking Tests Validating Tests Part 4

Use a Performance Framework
Ensure the Test Duration is Consistent
Order Tests by Duration
Keep Reproduction Cases Small
Setup The Environment Before Each Test
Make Updating Tests Easy
Errors Should Be Fatal
Part 5

Format
Use Individual Sample Data
Detecting Results in the Noise
Outliers
Result Precision
If All Else Fails Use Test Duration
Delivering Results

The overall e-book presentation of "Performance Testing Stack" seems underdeveloped. Most topics could have been greatly expanded. But, as Matt informed me, this is probably due to the content amounting to a concatenation of previously written blog posts. As I said earlier, there are no shortcuts to writing well. The paucity of detail, however, is offset by the shear enthusiasm the Matt brings to his writing: an aspect that deserves separate acknowledgement because performance testing is a very complex subject which can otherwise appear dry and mind-boggling to the uninitiated. And it's the uninitiated that Matt wants to reach. He has written this book in order to encourage the uninitiated reader to seriously consider entering the field.

Some points that could have been developed further, include:

Plots and tables can be expanded for legibility by double clicking on them
The term "benchmark" needs better explaination
Section 2.1 discusses the Harmonic mean but there's no discussion of the Geometric mean.
Section 3.3 on Distributions does not clearly distinguish between analytic (parametric) distributions and sample distributions (which usually have no analytic form).
Section 5 (p.85) on Result Precision needs to discuss the difference between accuracy, precision, and error.

Conversely, Matt's enthusiasm may have gone a bit overboard in his choice of title. The book promises:

This book will walk you through designing and building a performance testing stack from scratch; step by step from planning and running performance tests through to understanding and analysing the results. If you’re new to performance testing or looking to expand your understanding of this topic then this book is for you!

Unfortunately, this book doesn't provide enough details to actually build a test stack—which would've been very cool. Rather, it presents a comprehensive overview of all the major concepts that one needs to absorb in order to develop a running performance testing stack. But even with the more limited scope, this book is still important because, off hand, I don't know of any other source where one can be introduced to performance testing without drowning in a sea of terminology, procedures and architectures.

Ultimately, this e-book is a great starting point for newbies, as well was being a good reminder for seasoned testers about what should be done in good performance tests.

DSConf 2019 Featured Talk

2019-01-02T17:55:00.000-08:00

"Applying The Universal Scalability Law to Distributed Systems"

DSConf'19 - Distributed Systems Conference (scroll down)
Pune, India
11am IST
February 16

I'm very much looking forward to this event and I thank @ShrivedAgashe for the invitation.

Guerrilla 2018 Classes Now Open

2018-06-25T10:09:00.000-07:00

All Guerrilla training classes are now open for registration.

GCAP: Guerrilla Capacity and Performance — From Counters to Containers and Clouds
GDAT: Guerrilla Data Analytics — Everything from Linear Regression to Machine Learning
PDQW: Pretty Damn Quick Workshop — Personal tuition for performance and capacity mgmt

The following highlights indicate the kind of thing you'll learn. Most especially, how to make better use of all that monitoring and load-testing data you keep collecting.

How to save millions of dollars with a one-line performance model (video)
How to minimize chargeback after you lift and shift to the cloud (video)
How to correctly emulate web traffic on a load-testing rig (PDF)

See what Guerrilla grads are saying about these classes. And how many instructors do you know that are available for you from 9am to 9pm (or later) each day of your class?

Who should attend?

IT architects
Application developers
Performance engineers
Sysadmins (Linux, Unix, Windows)
System engineers
Test engineers
Mainframe sysops (IBM. Hitachi, Fujitsu, Unisys)
Database admins
Devops practitioners
SRE engineers
Anyone interested in getting beyond performance monitoring

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. The room-booking link is on the registration page.

Tell a colleague and see you in September!

Chargeback in the Cloud - The Movie

2018-06-20T11:24:00.000-07:00

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

USL Scalability Modeling with Three Parameters

2018-05-20T19:45:00.005-07:00

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.

The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:

providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
satisfying the von Neumann criterion: minimal number of modeling parameters

Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better. Additionally, Baron Schwarz told me that he'd had better luck fitting production RDBMS data in Excel by substituting a third parameter into the numerator of the USL. As ever, the question remained: How could this actually work?

The 3-parameter USL model

Going back to equation (\ref{eqn: usl2}), let's just consider the simplest case where scaling is linear-rising, as would be the case for ideal parallelism. In the linear region, where $\alpha = \beta = 0$, equation (\ref{eqn: usl2}) simplifies to \begin{equation} X(N) = N \, X(1) \label{eqn: usl1} \end{equation}

In other words, the overall throughput $X(N)$ increases in simple proportion to $N$. The "single-user" throughput, $X(1)$, doesn't change and therefore acts like a constant of proportionality.

But what happens when we don't know the value of $X(1)$? That means the $X(1)$ factor in equations (\ref{eqn: usl2}) and (\ref{eqn: usl1}) is undefined. We might denote this situation by writing

\begin{equation} X(N) = N \, ? \label{eqn: uslx} \end{equation}

Of course, that makes no sense, mathematically speaking. As already mentioned, the conventional way out of this situation is to estimate the value of $X(1)$ using mathematical interpolation. But here's the epiphany.

Rather than using the more complicated interpolation procedure, we can simply appeal to statistical regression! Yes, that's right, we treat the USL equation as a conventional multivariate statistical model. After all, we're already using nonlinear statistical regression to determine the $\alpha$ and $\beta$ parameters. More importantly, since statistics is not math, we can replace equation ($\ref{eqn: uslx}$) with a statement about correlation, rather than strict equality. In statistical models, that's accomplished by introducing another parameter (I'll call it $\gamma$, since that's the third letter of the Greek alphabet) to replace the question mark in equation ($\ref{eqn: uslx}$), namely

\begin{equation} X(N) = N \, \gamma \label{eqn: uslg} \end{equation}

The new parameter $\gamma$ is just a constant of proportionality that represents the slope of the line associated with ideal parallel scaling. See the plots below.

And here's a little piece of magic. If we choose $N = 1$ in equation ($\ref{eqn: uslg}$), it becomes $X(1) = \gamma$. So, when the $\gamma$ parameter is determined by statistical regression, it also tells us the estimated value of $X(1)$, whether it was measured or missing. In other words, we don't need to do any explicit interpolation because the nonlinear regression procedure does it automatically by fitting the third parameter.

Equation (\ref{eqn: usl2}) is now replaced by a 3-parameter version of the USL model: \begin{equation} X(N) = \frac{N \, \gamma}{1 + \alpha \, (N - 1) + \beta N \, (N - 1)} \label{eqn: usl3} \end{equation}

Unlike the 2-parameter USL, equation (\ref{eqn: usl3}) can be fitted directly to your throughput measurements without the need to do any data normalization or interpolation. The following examples show the results of fitting the 3-parameter USL model.

Load-test data

These are load-test data and the "single-user" throughput was measured as $X(1) = 955.16$ per unit time. The 3-parameter USL fit is summarized in the following plot.

The fitted value of $\gamma = 995.65$, which is the estimated value of $X(1)$. It can also be regarded as the slope of the linear-rising throughput indicated by the sloping red line on the left of the plot.

Production data

These data are from a continuously running production system and thus, no $X(1)$ was ever produced.

The fitted value of $\gamma = 3.22$ is also equivalent to the estimated value of $X(1)$. Similarly, it can be regarded as the slope of the linear-rising throughput on the left of the plot. Interestingly, in these data, $\alpha = 0$, while $beta$ is non-zero. That suggests there is no significant contention in the workload but there is some data exchange coherency at play.

One word of caution. Fitting the 3-parameter USL can be more sensitive to the actual data, especially with a large number of production data scatter points. I'll go into all this, and more, in the upcoming Guerrilla training classes.

The Geometry of Latency

2018-04-22T13:09:00.001-07:00

... AKA hyperbolae.

Here's a mnemonic tabulation based on dishes and bowls:

Hopefully this makes amends for the more complicated explanation I wrote for CMG back in 2009 entitled: "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ", which I'm pretty sure almost nobody understood.

Virtual cloudXchange 2018 Conference

2018-04-21T08:12:00.000-07:00

Our abstract has been accepted for presentation at the FREE cloudXchange online event to be held by CMG on June 19th at 10am Pacific (5pm UTC). [Extended slides]

Exposing the Cost of Performance
Hidden in the Cloud

Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany
10am Pacific Time on June 19, 2018

Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.

WTF is Modeling, Anyway?

2018-03-14T09:11:00.000-07:00

A conversation with performance and capacity management veteran Boris Zibitsker, on his BEZnext channel, about how to save multiple millions of dollars with a one-line performance model (at 21:50 minutes into the video) that has less than 5% error. I wish my PDQ models were that good. :/

The strength of the model turns out to be its explanatory power, rather than prediction, per se. However, with the correct explanation of the performance problem in hand (which also proved that all other guesses were wrong), this model correctly predicted a 300% reduction in application response time for essentially no cost. Modeling doesn't get much better than this.

Footnotes

According to Computer World in 1999, a 32-node IBM SP2 cost $2 million to lease over 3 years. This SP2 cluster was about 6 times bigger.
Because of my vain attempt to suppress details (in the interests of video length), Boris gets confused about the kind of files that are causing the performance problem (near 26:30 minutes). They're not regular data files and they're not executable files. The executable is already running but sometimes waits—for a long time. The question is, waits for what? They are, in fact, special font files that are requested by the X-windows application (the client, in X parlance). These remote files may also get cached, so it's complicated. In my GCAP class, I have more time to go into this level of detail. Despite all these potential complications, my 'log model' accurately predicts the mean application launch time.
Log_2 assumes a binary tree organization of font files whereas, Log_10 assumes a denary tree.
Question for the astute viewer. Since these geophysics applications were all developed in-house, how come the developers never saw the performance problems before they ever got into production? Here's a hint.
Some ppl have asked why there's no video of me. This was the first time Boris had recorded video of a Skype session and he pushed the wrong button (or something). It's prolly better this way. :P

CPU Idle Is Not Like White Space

2018-02-21T00:08:00.000-08:00

This post seems like it ought to be too trite to write but, I see the following performance gotcha cropping up over and over again.

Under pressure to consolidate resources, usually driven by management and especially regarding processor capacity, there is often an urge to "use up" any idle processor cycles. Idle processor capacity tends to be viewed like it's whitespace on a written page—just begging to be filled up.

The logical equivalent of filling up the "whitespace" is absorbing idle processor capacity by migrating applications that are currently running on other servers and turning those excess servers off or using them for something else.

The blindspot, however, is that idle processor capacity is not like whitespace and the rush to absorb it is likely to have unintended consequences. The reason is that performance metrics are neither isolated nor independent of one another. Most metrics are:

related to one another due to interdependencies between various computing subsystems
related in a nonlinear way: a small change in one metric can cause a large change in another

The first couple of days of my Guerrilla Capacity Planning course lays out a consistent framework that demonstrates how all the familiar performance metrics are related.

For example, application response time depends nonlinearly on processor utilization. In the above case, it may be have been forgotten that the processor utilization must be kept low in order for the application to meet its response time SLA. A lot of idle processor cycles can appear to be unused processor capacity only because there is no obvious warning sign that low CPU is a necessary condition for correct application performance.

Even on a printed page, whitespace is not usually an invitation to start scribbling on it. Similarly, a notification equivalent to "This page intentionally left blank" would be a useful reminder in the context of potential application migration.

Of course, any page that says "This page left blank" isn't blank, but that's a topic for a different discussion. :)

Guerrilla Training September 2017

2017-08-04T11:15:00.000-07:00

This year offers a new 3-day class on applying PDQ in your workplace. The classic Guerrilla classes, GCAP and GDAT, are also being presented.

Who should attend?

architects
application developers
performance engineers
sys admins
test engineers
mainframe sysops
database admins
webops
anyone interested in getting beyond ad hoc performance analysis and capacity planning skills

Some course highlights:

There are only 3 performance metrics you need to know
How to quantify scalability with the Universal Scalability Law
Hadoop performance and capacity management
Virtualization Spectrum from hyper-threads to cloud services
How to detect bad data
Statistical forecasting techniques
Machine learning algorithms applied to performance data

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. Link is on the registration page.

Also see what graduates are saying about these classes.

Tell a colleague and see you in September!

Morphing M/M/m: A New View of an Old Queue

2017-03-15T10:53:00.002-07:00

The following abstract has been accepted for presentation at the 21st Conference of the International Federation of Operational Research Societies — IFORS 2017, Quebec City, Canada.

Update July 31, 2017: Here are my IFORS slides
Update June 08, 2018: In response to an audience question at my IFORS 2017 session, I have now demonstrated that there is an upper bound for the error in the morphing approximation. See footnotes below.
Update August 18, 2020: This new paper has the complete exact solution method that I had been seeking all along.

This year is the centenary of A. K. Erlang's paper [1] on the determination of waiting times in an M/D/m queue with $m$ telephone lines.^* Today, M/M/m queues are used to model such systems as, call centers [3], multicore computers [4,5] and the Internet [6,7]. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation^† to the exact M/M/m queue [3] that is accurate to within 10% for typical applications^‡. The morphing formula for the residence-time, $R(m,\rho)$, is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:

The morphing solutions correspond to the $m$-roots of unity in the complex $z$-plane.
The exact solutions can be expressed as a rational function, $R(m,z)$.
The poles of $R(m,z)$ lie inside the unit disk, $|z| < 1$, and converge around the Szego; curve [8] as $m$ is increased.
The correction factor for the morphing model is defined by the deflated polynomial belonging to $R(m,z)$.
The pattern of poles in the $z$-plane provides a convenient visualization of how the morphing solutions differ from the exact solutions.

* Originally, Erlang assumed the call holding time, or mean service time $S$, was deterministic with unit period, $S=1$ [1,2]. The generalization to exponentially distributed service periods came later. Ironically, the exponential case is easier to solve than the apparently simpler deterministic case. That's why the M/D/1 queue is never the first example discussed in queueing theory textbooks.
† The derivation of the morphing model is presented in Section 2.6.6 of the 2005 edition of [4], although the word "morphing" is not used there. The reason is, I didn't know how to produce the exact result from it, and emphasizing it would likely have drawn unwarranted attention from Springer-Verlag editors. By the time I was writing the 2011 edition of [4], I was certain the approximate formula did reflect the morphing concept in its own right, even though I still didn't know how to connect it to the exact result. Hence, the verb "morphs" timidly makes its first and only appearance in the boxed text following equation 4.61.
‡ The relative error peaks at 10% for $m \sim 20$ and $\rho \sim 90\%$, then peaks again at 20% for $m \sim 2000$ and the servers running 99.9% busy. However, the rate of increase in peak error attenuates such that the maximum error is less than 25%, even as $m \rightarrow \infty$ and $\rho \rightarrow 100\%$. A plot of the corresponding curves gives a clearer picture. This behavior is not at all obvious. Prior to this result, it could have been that the relative error climbed to 100% with increasing $m$ and $\rho$.

References

A. K. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917.
A. K. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, vol 20, 1909.
E. Chromy, T. Misuth, M. Kavacky, "Erlang C Formula and Its Use In The Call Centers," Advances in Electrical and Electronic Engineering, Vol. 9, No. 1, March 2011.
N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer-Verlag, 2005 and 2011.
N. J. Gunther, S. Subramanyam, and S. Parvu, "A Methodology for Optimizing Multithreaded System Performance on Multicore Platforms," in Programming Multicore and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing, February 2017.
N. J. Gunther, "Numerical Investigations of Physical Power-law Models of Internet Traffic Using the Renormalization Group," IFORS 2005, Honolulu, Hawaii, July 11—15.
T. Bonald, J. W. Roberts, "Internet and the Erlang formula," ACM SIGCOMM Computer Communication Review, Volume 42, Number 1, January 2012.
C. Diaz Mendoza and R. Orive, "The Szegő curve and Laguerre polynomials with large negative parameters," Journal of Mathematical Analysis and Applications, Volume 379, Issue 1, Pages 305—315, 1 July 2011.

GitHub Growth Appears Scale Free

2017-01-17T19:26:00.000-08:00

Update of Thursday, August 17, 2017: It's looks like we can chalk up another one for the scale-free model (described below) as Github apparently surpasses 20 million users. Outgoing CEO Wanstrath mentioned this number in an emailed statement to Business Insider.

"As GitHub approaches 700 employees, with more than $200M in ARR, accelerating growth, and more than 20 million registered users, I'm confident that this is the moment to find a new CEO to lead us into the next stage of growth. ....."

The Original Analysis

In 2013, a Redmonk blogger claimed that the growth of GitHub (GH) users follows a certain type of diffusion model called Bass diffusion. Here, growth refers to the number of unique user IDs as a function of time, not the number project repositories, which can have a high degree of multiplicity.

In a response, I tweeted a plot that suggested GH growth might be following a power law, aka scale free growth. The tell-tale sign is the asymptotic linearity of the growth data on double-log axes, which the original blog post did not discuss. The periods on the x-axis correspond to years, with the first period representing calendar year 2008 and the fifth period being the year 2012.

Scale free networks can arise from preferential attachment to super-nodes that have a higher vertex degree and therefore more connections to other nodes, i.e., a kind of rich-get-richer effect. Similarly for GH growth viewed as a particular kind of social network. The interaction between software developers using GH can be thought of as involving super-nodes that correspond to influential users attracting prospective GH users to open a new account and contribute to their project.

On this basis, I predicted GH would reach 4 million users during October 2013 and 5 million users during April 2014 (yellow points in the Linear axes plot below). In fact, GH reached those values slightly earlier than predicted by the power law model, and slightly later than the dates predicted by the diffusion model (modulo unreported errors in the data).

Since 2013, new data has been reported so, I extended my previous analysis. Details of the respective models are contained in the R script at the end of this post. In the Linear axes plot below, both the diffusion model and power model essentially form an envelope around the newer data: diffusive on the upper side (red curve) and power law on the lower side (blue curve). In thise sense, it could be argued that the jury is still out on which model offers the more reliable predictions.

However, there is an aspect of the diffusion model that was overlooked in 2013. It predicts that GH growth will eventually plateau at 20 million users in 2020 (the 12th period, not shown) because it is a type of logistic function that has a characteristic sigmoidal or 'S' shape. The beginnings of this leveling off (the top of the 'S') is apparent in the 10th period (i.e., 2017). By contrast, the power law model predicts that GH will reach 23.65 million users by the end of 2017 (yellow point). Whereas the two curves envelope the more recent data in periods 6–9, they start to diverge significantly in the 10th period.

"GitHub is not the only player in the market. Other companies like GitLab are doing a good job but GitHub has a huge head start and the advantage of the network effect around public repositories. Although GitHub’s network effect is weaker compared to the likes of Facebook/Twitter or Lyft/Uber, they are the default choice right now." —GitHub is Doing Much Better Than Bloomberg Thinks

Although there will inevitably be an equilibrium bound on the number of active GH users, it seems unlikely to be as small as 20 million, given the combination of GH's first-mover advantage and its current popularity. Presumably the private investors in GH also hope it will be a large number. This year will tell.


# Data source ... https://classic.scraperwiki.com/scrapers/github_users_each_year/

#LINEAR axes plot
plot(df.gh3$index, df.gh3$users, xlab="Period (years)", 
     ylab="Users (million)", col="gray", 
     ylim=c(0, 3e7), xaxt="n", yaxt="n")
axis(side=1, tck=1, at=c(0, seq(12,120,12)), labels=0:10, 
     col.ticks="lightgray", lty="dotted")
axis(side=2, tck=1, at=c(0, 10e6, 20e6, 30e6), labels=c(0,10,20,30), 
     col.ticks="lightgray", lty="dotted")

# Simple exp model
curve(coef(gh.exp)[2] * exp(coef(gh.exp)[1] * (x/13)), 
      from=1, to=108, add=TRUE, col="red2", lty="dot dash")

# Super-exp model 
curve(49100 * (x/13) * exp(0.54 * (x/13)), 
      from=1, to=120, add=TRUE, col="red", lty="dashed")

# Bass diffusion model
curve(21e6 * ( 1 - exp(-(0.003 + 0.83) * (x/13)) ) / ( 1 + (0.83 / 0.003) * exp(-(0.003 + 0.83) * (x/13)) ), 
      from=1, to=120, add=TRUE, col="red")

# Power law model
curve(10^coef(gh.fit)[2] * (x/13)^coef(gh.fit)[1], from=1, to=120, add=TRUE, 
      col="blue")

title(main="Linear axes: GitHub Growth 2008-2017")
legend("topleft",
  legend=c("Original data", "New data",  "Predictions", "Exponentital", "Super exp", "Bass diffusion", "Scale free"), 
       lty=c(NA,NA,NA,4,2,1,1), pch=c(1,19,21,NA,NA,NA,NA), 
  col=c("gray", "black", "yellow", "red", "red", "red", "blue"), 
  pt.bg = c(NA,NA,"yellow",NA,NA,NA,NA),
  cex=0.75, inset=0.05)

Crib Sheet for Emulating Web Traffic

2016-10-08T14:55:00.010-07:00

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools (PDF) is now available online and will be presented at the upcoming CMG conference in November.

Presenter: James Brady (co-author: Neil Gunther)
Session Number: 436
Subject Area: APM
Session Date: Wed, November 9, 2016
Session Time: 1:00 PM - 2:00 PM
Session Room: PortofinoB

The motivation for this work harks back to a Guerrilla forum in 2014 that essentially centered on the same topic as the title of our paper. It was clear from that discussion that commenters were talking at cross purposes because of misunderstandings on many levels. I had already written a much earlier blog post on the key queue-theoretic concept, viz., holding the $N/Z$ ratio constant as the load $N$ is increased, but I was incapable of describing how that concept should be implemented in a real load-testing environment.

On the other hand, I knew that Jim Brady had presented a similar implementation in his 2012 CMG paper, based on a statistical analysis of the load-generation traffic. There were a few details that I couldn't quite reconcile in Jim's paper but, at the CMG 2015 conference in San Antonia, I suggested that we should combine our separate approaches and aim at a definitive work on the subject. After nine months gestation (ugh!), this 30-page paper is the result.

Although our paper doesn't contain any new invention, per se, the novelty lies in how we needed to bring together so many disparate and subtle concepts in precisely the correct way to reach a complete and consistent methodology. The complexity of this task was far greater than either of us had imagined at the outset. The hyperlinked Glossary should help with the terminology, but because there are so many interrelated parts, I've put together the following crib notes in an effort to help performance engineers get through it (since they're the ones that most stand to benefit). The key results of our paper are indicated by ◀

Standard load-test tools only allow finite number of virtual users
Web traffic is characterized by an unlimited number of real users
Attention usually focused on SUT (system under test) performance
We focus on the GEN (load generator) characteristics
Measure distribution of web requests and their mean arrival rate
Web traffic should be a Poisson process (cf. A.K. Erlang 1909) ◀
Requires statistically independent arrivals, i.e., no correlations
Independent requests should arrive asynchronously into the SUT
But virtual-user requests are synchronized (correlated) in SUT queues ◀
De-correlate arrivals by shrinking SUT queues (Principle A): ◀
- Shrink by increasing think delay $Z$ with user load as a fixed ratio
- Traffic rate $\lambda$ approaches a constant $\lambda = N/Z$ as SUT queues shrink
Check requests are Poisson by measuring the coefficient of variation ($CoV$)
Require $CoV \approx 1$ for a Poisson process (Principle B) ◀

Originally, I assumed the paper would be no more than a third it's current length. Wrong! My only defense is: it's all there, you just need to read it. (tl;dr doesn't apply) Apologies in advance, but hopefully, these crib notes will help you.

A Clue for Remembering Little's Laws

2016-10-01T17:11:00.000-07:00

During the Guerrilla Data Analysis class last week, alumnus Jeff P. came up with this novel mnemonic device for remembering all three forms of Little's law that relate various queueing metrics to the mean arrival rate $\lambda$.

The right-hand side of each equation representing a version of Little's law is written vertically in the order $R, W, S$, which matches the expression $R=W+S$ for the mean residence time, viz., the sum of the mean waiting time ($W$) and the mean service time ($S$).

The letters on the left-hand side: $Q, L, U$ (reading vertically) respectively correspond to the queueing metrics: queue-length, waiting-line length, and utilization, which can be read as the word clue.

Incidentally, the middle formula is the version that appears in the title of John Little's original paper:

J. D. C. Little, ``A Proof for the Queuing Formula: L = λ W,''
Operations Research, 9 (3): 383—387 (1961)