Possibly pithy insights into computer performance analysis and capacity planning based on the Guerrilla series of books and training classes provided by Performance Dynamics Company.
Thursday, May 8, 2025
How to Quantify Scalability
Friday, February 14, 2025
Does the Efficiency Compute Frontier Represent New Physics?
Sunday, May 20, 2018
USL Scalability Modeling with Three Parameters
Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)
The 2-parameter USL model
The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}Fitting this nonlinear USL equational model to data requires several steps:
- normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
- equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
- if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.
- providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
- satisfying the von Neumann criterion: minimal number of modeling parameters
Monday, March 23, 2015
Hadoop Scalability Challenges
The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.
However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:
- So called flat development scalability
- Super scaling performance
- New TPC big data benchmark
See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.
Wednesday, August 13, 2014
Intel TSX Multicore Scalability in the Wild
Intel met or beat their projected availability schedule (depending on how you count) for Penryn by essentially rebooting their foundries. Very impressive.
In my Guerrilla classes I like to pose the question: Can you procure a 10 GHz microprocessor? On thinking about it (usually for the first time), most people begin to realize that they can't, but they don't know why not. Clearly, clock frequency limitations have an impact on both performance and server-side capacity. Then, I like to point out that programming multicores (since that decision has already been made for you) is much harder than it is for uniprocessors. Moreover, there is not too much in the way of help from compilers and other development tools, at the moment, although that situation will continually improve, presumably. Intel TSX (Transactional Synchronization Extensions) for Haswell multicores offers assistance of that type at the hardware level. In particular, TSX instructions are built into Haswell cores to boost the performance and scalability of certain types of multithreaded applications. But more about that in a minute.
I also like to point out that Intel and other microprocessor vendors (of which there are fewer and fewer due the enormous cost of entry), have little interest in how well your database, web site, or commercial application runs on their multicores. Rather, their goal is to produce the cheapest chip for the largest commodity market, e.g., PCs, laptops, and more recently mobile. Since that's where the profits are, the emphasis is on simplest design, not best design.
Server-side performance is usually relegated to low man on the totem pole because of its relatively smaller market share. The implicit notion is that if you want more performance, just add more cores. But that depends on the threadedness of the applications running on those cores. Of course, there can also be side benefits, such as inheriting lower power servers from advances in mobile chip technology.Fast, cheap, reliable: pick two.
Intel officially announced multicore processors based on the Haswell architecture in 2013. Because scalability analysis can reveal a lot about limitations of the architecture, it's generally difficult to come across any quantitative data in the public domain. In their 2012 marketing build up, however, Intel showed some qualitative scalability characteristics of the Haswell multicore with TSX. See figure above. You can take it as read that these plots are based on actual measurements.
Most significantly, note the classic USL scaling profiles of transaction throughput vs. number of threads. For example, going from coarse-grain locking without TSX (red curve exhibiting retrograde throughput) to coarse-grain locking with TSX (green curve) has reduced the amount of contention (i.e., USL α coefficient). It's hard to say what is the impact of TSX on coherency delay (i.e., USL β coefficient) without being in possession of the actual data. As expected, however, the impact of TSX on fine-grain locking seems to be far more moderate. A 2012 AnandTech review summed things up this way:
TSX will be supported by GCC v4.8, Microsoft's latest Visual Studio 2012, and of course Intel's C compiler v13. GLIBC support (rtm-2.17 branch) is also available. So it looks like the software ecosystem is ready for TSX. The coolest thing about TSX (especially HLE) is that it enables good scaling on our current multi-core CPUs without an enormous investment of time in the fine tuning of locks. In other words, it can give developers "fined grained performance at coarse grained effort" as Intel likes to put it.In theory, most application developers will not even have to change their code besides linking to a TSX enabled library. Time will tell if unlocking good multi-core scaling will be that easy in most cases. If everything goes according to Intel's plan, TSX could enable a much wider variety of software to take advantage of the steadily increasing core counts inside our servers, desktops, and portables.
With claimed clock frequencies of 4.6 GHz (i.e., nominal 5000 MIPS), Haswell with TSX offers superior performance at the usual price point. That's two. What about reliability? Ah, there's the rub. TSX has been disabled in the current manufacturing schedule due to a design bug.
Tuesday, November 6, 2012
Hotsos 2013: Superlinear Scalability
- VoltDB
- Postgres
- My USL analysis of the Postgres data
More recently, it was brought to my attention that the USL fails when it comes to modeling superlinear performance (e.g., see this Comments section). Superlinear scalability means you get more throughput than the available capacity would be expected to support. It's even discussed on the Wikipedia (so it must be true, right?). Nice stuff, if you can get it. But it also smacks of an effect like perpetual motion.

Every so often, you see a news report about someone discovering (again) how to beat the law of conservation of energy. They will swear up and down that it works and it will be accompanied by a contraption that proves it works. Seeing is believing, after all. The hard part is not whether to believe their claim, it's debugging their contraption to find the mistake that has led them to the wrong conclusion.
Similarly with superlinearity. Some data are just plain spurious. In other cases, however, certain superlinear measurements do appear to be correct, in that they are repeatable and not easily explained away. In that case, it was assumed that the USL needed to be corrected to accommodate superlinearity by introducing a third modeling parameter. This is bad news for many reasons, but primarily because it would weaken the universality of the universal scalability law.
To my great surprise, however, I eventually discovered that the USL can accommodate superlinear data without any modification to the equation. As an unexpected benefit, the USL also warns you that you're modeling an unphysical effect: like a perpetual-motion detector. A corollary of this new analysis is the existence of a payback penalty for incurring superlinear scalability. You can think of this as a mathematical statement of the old adage: If it looks too good to be true, it probably is.
I'll demonstrate this remarkable result with examples in my Hotsos presentation.
Wednesday, April 11, 2012
PostgreSQL Scalability Analysis Deconstructed
Robert Haas is one of those people and he has applied the USL to Postgres scalability analysis. This is all good news. However, there are plenty of traps for new players and Robert has walked in several of them to the point where, by his own admission, he became confused about what conclusions could be drawn from his USL results. In fact, he analyzed three cases:
- PostgreSQL 9.1
- PostgreSQL 9.2 with fast locking
- PostgreSQL 9.2 current release
Sunday, January 1, 2012
My Year in Review 2011

Monday, October 24, 2011
Webinar: Load Testing Meets Data Analytics
Thursday, May 26, 2011
Quantifying Scalability FTW (The Movie)
Here's the abstract:
You probably already collect performance data, but data ain't information. Successfull scalability requires transforming your data so that you can quantify the cost-benefit of any architectural decisions. In other words:
measurement + models == information
So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.
Friday, February 4, 2011
USL Fine Point: Sub-Amdahl Scalability
Saturday, November 13, 2010
Reporting Standard Errors for USL Coefficients
.... BS> Using gnuplot against the dataset I gave, I get BS> sigma 0.0207163 +/- 0.001323 (6.385%) BS> kappa 0.000861226 +/- 5.414e-05 (6.287%)The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of these errors also arose in a recent talk I gave. Typically, this question doesn't come up because there's more focus on assessing the residual errors as a measure of fit for the USL against the data set. Also, statistical accuracy can be a bigger issue when there are only a small number of samples. Barron reported 32 data points, so that's not an problem in this case.
Wednesday, August 25, 2010
Excel Errors and Other Numerical Nightmares
Monday, July 5, 2010
Prime Parallels for Load Balancing
Friday, June 25, 2010
Velocity 2010 The Aftermathglow

Thread-limited scalability of memcached
Working with Shanti and Stefan of Oracle (née Sun Microsystems), I was able to accomplish that goal. Our session was rated 92.4%, which is an A+ in anyone's books. Congrats to us and the Velocity organizers and thank you, crowd.
Monday, June 21, 2010
Memcached and Friends at Velocity 2010

Click on the image for details
Shanti and I will be presenting at 1300 on Thursday. The Velocity conference is being held at the Hyatt Regency Santa Clara, near Great America.
Friday, March 19, 2010
Memcached Scalability at Velocity 2010

Here's our abstract:
Hidden Scalability Gotchas in Memcached and Friends
Neil Gunther (Performance Dynamics), Shanti Subramanyam (Oracle Corporation), Stefan Parvu (Sun Microsystems)
Most web deployments have standardized on horizontal scaleout in every tier—web, application, caching and database—using cheap, off-the-shelf, white boxes. In this approach, there are no real expectations for vertical scalability of server apps like memcached or the full LAMP stack. But with the potential for highly concurrent scalability offered by newer multicore processors, it is no longer cost-effective to ignore their underutilization due to poor, thread-level, scalability of the web stack. In this session we show you how to quantify scalability with the Universal Scalability Law (USL) by demonstrating its application to actual performance data collected from a memcached benchmark. As a side effect of our technique, you will see how the USL also identifies the most signficant performance tuning opportunities to improve web app scalability.
Wednesday, September 9, 2009
This is Your Measurements on Models
Tuesday, August 18, 2009
Response Time Knees and Queues
In a previous blog post, I pointed out that such a "knee" is actually an optical illusion. Nonetheless, this same question arose in last month's CMG MeasureIT, as a kind of survey entitled "Does the Knee in a Queuing Curve Exist or is it just a Myth?" Although that author concludes (correctly) that the existence of a "knee" (as it is usually meant) is bogus, the panoply of responses was quite astounding—especially coming from professionals who ought to know better. In this month's MeasureIT, I examine the same question in a rigorous but unconventional way under the title "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ."