The Pith of Performance: scalability

Showing posts with label scalability. Show all posts

Thursday, May 8, 2025

How to Quantify Scalability

How to Quantify Scalability: Synopsis of The Universal Scalability Law (USL), has just been posted in PDF format for improved readability. I did this post haste so, reporting any typos, broken links, etc., is always appreciated.

Friday, February 14, 2025

Does the Efficiency Compute Frontier Represent New Physics?

This talk will be presented at the APS Global Physics Summit on Monday, March 17, 2025 at 11:30 AM.

Sunday, May 20, 2018

USL Scalability Modeling with Three Parameters

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.

The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:

providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
satisfying the von Neumann criterion: minimal number of modeling parameters

Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better.

Hadoop Scalability Challenges

Hadoop is hot, not because it necessarily represents cutting edge technology, but because it's being rapidly adopted by more and more companies as a solution for engaging in the big data trend. It may be coming to your company sooner than you think.

The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.

However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:

So called flat development scalability
Super scaling performance
New TPC big data benchmark

See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.

Intel TSX Multicore Scalability in the Wild

Multicore processors were introduced to an unsuspecting marketplace more than a decade ago, but really became mainstream circa 2005. Multicore was presented as the next big thing in microprocessor technology. No mention of falling off the Moore's law (uniprocessor) curve. A 2007 PR event—held jointly between Intel, IBM and AMD—announced a VLSI fabrication technology that broke through the 65 nm barrier. The high-κ Hafnium gate enabled building smaller transistors at 45 nm feature size and thus, more cores per die. I tracked and analyzed the repercussions of that event in these 2007 blog posts:

Intel met or beat their projected availability schedule (depending on how you count) for Penryn by essentially rebooting their foundries. Very impressive.

In my Guerrilla classes I like to pose the question: Can you procure a 10 GHz microprocessor? On thinking about it (usually for the first time), most people begin to realize that they can't, but they don't know why not. Clearly, clock frequency limitations have an impact on both performance and server-side capacity. Then, I like to point out that programming multicores (since that decision has already been made for you) is much harder than it is for uniprocessors. Moreover, there is not too much in the way of help from compilers and other development tools, at the moment, although that situation will continually improve, presumably. Intel TSX (Transactional Synchronization Extensions) for Haswell multicores offers assistance of that type at the hardware level. In particular, TSX instructions are built into Haswell cores to boost the performance and scalability of certain types of multithreaded applications. But more about that in a minute.

I also like to point out that Intel and other microprocessor vendors (of which there are fewer and fewer due the enormous cost of entry), have little interest in how well your database, web site, or commercial application runs on their multicores. Rather, their goal is to produce the cheapest chip for the largest commodity market, e.g., PCs, laptops, and more recently mobile. Since that's where the profits are, the emphasis is on simplest design, not best design.

Fast, cheap, reliable: pick two.

Server-side performance is usually relegated to low man on the totem pole because of its relatively smaller market share. The implicit notion is that if you want more performance, just add more cores. But that depends on the threadedness of the applications running on those cores. Of course, there can also be side benefits, such as inheriting lower power servers from advances in mobile chip technology.

Source: AnandTech, Inc.

Intel officially announced multicore processors based on the Haswell architecture in 2013. Because scalability analysis can reveal a lot about limitations of the architecture, it's generally difficult to come across any quantitative data in the public domain. In their 2012 marketing build up, however, Intel showed some qualitative scalability characteristics of the Haswell multicore with TSX. See figure above. You can take it as read that these plots are based on actual measurements.

Most significantly, note the classic USL scaling profiles of transaction throughput vs. number of threads. For example, going from coarse-grain locking without TSX (red curve exhibiting retrograde throughput) to coarse-grain locking with TSX (green curve) has reduced the amount of contention (i.e., USL α coefficient). It's hard to say what is the impact of TSX on coherency delay (i.e., USL β coefficient) without being in possession of the actual data. As expected, however, the impact of TSX on fine-grain locking seems to be far more moderate. A 2012 AnandTech review summed things up this way:

TSX will be supported by GCC v4.8, Microsoft's latest Visual Studio 2012, and of course Intel's C compiler v13. GLIBC support (rtm-2.17 branch) is also available. So it looks like the software ecosystem is ready for TSX. The coolest thing about TSX (especially HLE) is that it enables good scaling on our current multi-core CPUs without an enormous investment of time in the fine tuning of locks. In other words, it can give developers "fined grained performance at coarse grained effort" as Intel likes to put it.
In theory, most application developers will not even have to change their code besides linking to a TSX enabled library. Time will tell if unlocking good multi-core scaling will be that easy in most cases. If everything goes according to Intel's plan, TSX could enable a much wider variety of software to take advantage of the steadily increasing core counts inside our servers, desktops, and portables.

With claimed clock frequencies of 4.6 GHz (i.e., nominal 5000 MIPS), Haswell with TSX offers superior performance at the usual price point. That's two. What about reliability? Ah, there's the rub. TSX has been disabled in the current manufacturing schedule due to a design bug.

Tuesday, November 6, 2012

Hotsos 2013: Superlinear Scalability

As readers of this blog know, the Universal Scalability Law (USL) is a framework for quantifying performance measurements and extrapolating load-test data. Applied as a statistical regression model, the two USL contention (α) and coherency (β) parameters numerically indicate the degree of sublinear scalability in the data, i.e., how much linear scaling you're losing due to sharing and consistency overheads. Some examples of USL scalability analysis applied to databases, include:

VoltDB
Postgres
My USL analysis of the Postgres data

More recently, it was brought to my attention that the USL fails when it comes to modeling superlinear performance (e.g., see this Comments section). Superlinear scalability means you get more throughput than the available capacity would be expected to support. It's even discussed on the Wikipedia (so it must be true, right?). Nice stuff, if you can get it. But it also smacks of an effect like perpetual motion.

Every so often, you see a news report about someone discovering (again) how to beat the law of conservation of energy. They will swear up and down that it works and it will be accompanied by a contraption that proves it works. Seeing is believing, after all. The hard part is not whether to believe their claim, it's debugging their contraption to find the mistake that has led them to the wrong conclusion.

Similarly with superlinearity. Some data are just plain spurious. In other cases, however, certain superlinear measurements do appear to be correct, in that they are repeatable and not easily explained away. In that case, it was assumed that the USL needed to be corrected to accommodate superlinearity by introducing a third modeling parameter. This is bad news for many reasons, but primarily because it would weaken the universality of the universal scalability law.

To my great surprise, however, I eventually discovered that the USL can accommodate superlinear data without any modification to the equation. As an unexpected benefit, the USL also warns you that you're modeling an unphysical effect: like a perpetual-motion detector. A corollary of this new analysis is the existence of a payback penalty for incurring superlinear scalability. You can think of this as a mathematical statement of the old adage: If it looks too good to be true, it probably is.

I'll demonstrate this remarkable result with examples in my Hotsos presentation.

Wednesday, April 11, 2012

PostgreSQL Scalability Analysis Deconstructed

In 2010, I presented my universal scalability law (USL) at the SURGE conference. I came away with the impression that nobody really understood what I was talking about (quantifying scalability) or, maybe DevOps types thought it was all too hard (math). Since then, however, I've come to find out that people like Baron Schwartz did get it and have since applied the USL to database scalability analysis. Apparently, things have continued to propagate to the point where others have heard about the USL from Baron and are now using it too.

Robert Haas is one of those people and he has applied the USL to Postgres scalability analysis. This is all good news. However, there are plenty of traps for new players and Robert has walked in several of them to the point where, by his own admission, he became confused about what conclusions could be drawn from his USL results. In fact, he analyzed three cases:

PostgreSQL 9.1
PostgreSQL 9.2 with fast locking
PostgreSQL 9.2 current release

I know nothing about Postgres but thankfully, Robert tabulated on his blog the performance data he used and that allows me to deconstruct what he did with the USL. Here, I am only going to review the first of these cases: PostgreSQL 9.1 scalability. I intend to return to the claimed superlinear effects in another blog post.

My Year in Review 2011

Some days I wonder if I ever actually accomplish anything anymore. Maybe it's time to just pack it in and become a greeter at Walmart. I know a bit about how queues work, so that should put me a few notches ahead of the competition. And I would expect the competition to be fierce because it's a pretty cushy job; but not every day, apparently.

Before taking the big leap, I decided it might be a good idea to note down some of the technical projects I've worked on this year (over and above the daily grind):

Webinar: Load Testing Meets Data Analytics

This Thursday, October 27 at 10 am PDT^*, I'll be participating in a webinar sponsored by SOASTA, Inc. They make a new breed of load-testing product called CloudTest® which, despite its name, is not restricted to load testing cloud-based apps, although it can do that too.

Quantifying Scalability FTW (The Movie)

The video of my presentation at the Surge 2010 conference on scalability and performance has finally been posted. Since it doesn't seem to be a streaming server, it may take several minutes to download the video (depending on the speed of your pipe). Also, the audio is suboptimal because it seems to have been recorded from the ambient loudspeakers rather than a direct mic. I was too busy giving the talk to remember the setup they used.

Here's the abstract:

You probably already collect performance data, but data ain't information. Successfull scalability requires transforming your data so that you can quantify the cost-benefit of any architectural decisions. In other words:

measurement + models == information
So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.

Friday, February 4, 2011

USL Fine Point: Sub-Amdahl Scalability

As discussed in Chapter 4 of my GCaP book, Amdahl's law is defined by a single parameter called the serial fraction, denoted by the symbol α and signifying the proportion of the total workload (W) that is serialized during execution. From the standpoint of parallel processing (where reference to Amdahl's law is most frequent) serialization means that portion of the workload can only execute on a single processor out of N parallel processors. The parallel speedup or relative capacity C_A(N) performance metric is given by: \begin{equation} C_A(N) = \frac{N}{1 + \alpha \, (N-1)} \end{equation} If there is no serialization in the workload, i.e., α = 0, then C_A(N) = N, which signifies that the workload scales linearly with the number of physical processors. The important observation made by Gene Amdahl (more than 40 years ago) is that even if α is relatively small, viz., a few percent of the execution time, scalability cannot continue to increase linearly. For example, if α = 5%, then C_A(N) will eventually reach a scalability ceiling given by 20 effective processors (1/α), even if there are hundreds of physical processors available in the system.

Reporting Standard Errors for USL Coefficients

In a recent Guerrilla CaP Group discussion, Baron S. wrote:

....
BS> Using gnuplot against the dataset I gave, I get 
BS>    sigma   0.0207163 +/- 0.001323 (6.385%) 
BS>    kappa   0.000861226 +/- 5.414e-05 (6.287%)

The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of these errors also arose in a recent talk I gave. Typically, this question doesn't come up because there's more focus on assessing the residual errors as a measure of fit for the USL against the data set. Also, statistical accuracy can be a bigger issue when there are only a small number of samples. Barron reported 32 data points, so that's not an problem in this case.

Excel Errors and Other Numerical Nightmares

Although I use Excel all the time, and I strongly encourage my students to use it for performance analysis and CaP, I was forced to include a warranty disclaimer in my GCaP book because I discovered a serious numerical error while writing Appendix B. There, my intention was just to show that Excel gives essentially the same results as Mathematica when using the USL scalability model. It didn't!

Prime Parallels for Load Balancing

Having finally popped the stack on computing prime numbers with R in Part II and Part III, we are now in a position to discuss their relevance for computational scalability.

Velocity 2010 The Aftermathglow

I was so impressed with Velocity 2009, I really wanted to present something at Velocity 2010.

Thread-limited scalability of memcached
Working with Shanti and Stefan of Oracle (née Sun Microsystems), I was able to accomplish that goal. Our session was rated 92.4%, which is an A+ in anyone's books. Congrats to us and the Velocity organizers and thank you, crowd.

Memcached and Friends at Velocity 2010

This is the week. Starts tomorrow and it's sold out!

Click on the image for details
Shanti and I will be presenting at 1300 on Thursday. The Velocity conference is being held at the Hyatt Regency Santa Clara, near Great America.

Friday, March 19, 2010

Memcached Scalability at Velocity 2010

Totally stoked about being selected for the Web Performance track at Velocity 2010.

Here's our abstract:

Hidden Scalability Gotchas in Memcached and Friends

Neil Gunther (Performance Dynamics), Shanti Subramanyam (Oracle Corporation), Stefan Parvu (Sun Microsystems)

Most web deployments have standardized on horizontal scaleout in every tier—web, application, caching and database—using cheap, off-the-shelf, white boxes. In this approach, there are no real expectations for vertical scalability of server apps like memcached or the full LAMP stack. But with the potential for highly concurrent scalability offered by newer multicore processors, it is no longer cost-effective to ignore their underutilization due to poor, thread-level, scalability of the web stack. In this session we show you how to quantify scalability with the Universal Scalability Law (USL) by demonstrating its application to actual performance data collected from a memcached benchmark. As a side effect of our technique, you will see how the USL also identifies the most signficant performance tuning opportunities to improve web app scalability.

This is Your Measurements on Models

When it comes to analyzing scalability data, I've stressed the importance of bringing measurements and models together. Some recent conversations with people who are just beginning to model their scalability data using the Universal Scalability Law (USL), have led me to realize that I have not made the steps behind the procedure as clear as I might have. So, let me address that here.

Response Time Knees and Queues

How do you determine where the response-time "knee" occurs? This is a question one commonly hears with reference to characterizing the performance of an application. Calculating where the response time suddenly begins to climb dramatically is considered, by many, to be an important determinant for such things as load testing, scalability analysis, and setting application service targets.

In a previous blog post, I pointed out that such a "knee" is actually an optical illusion. Nonetheless, this same question arose in last month's CMG MeasureIT, as a kind of survey entitled "Does the Knee in a Queuing Curve Exist or is it just a Myth?" Although that author concludes (correctly) that the existence of a "knee" (as it is usually meant) is bogus, the panoply of responses was quite astounding—especially coming from professionals who ought to know better. In this month's MeasureIT, I examine the same question in a rigorous but unconventional way under the title "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ."

Tuesday, July 28, 2009

Scalability in a Spreadsheet - Google Style

Speaking of spreadsheets, it's always nice when someone, who uses your ideas, takes the time to write to you about it. Case in point, Scott Roberts sent me the following email, telling me how he'd set up my USL model in Google Docs spreadsheets.

Thursday, May 8, 2025

Friday, February 14, 2025

Sunday, May 20, 2018

The 2-parameter USL model

Monday, March 23, 2015

Wednesday, August 13, 2014

Tuesday, November 6, 2012

Wednesday, April 11, 2012

Sunday, January 1, 2012

Monday, October 24, 2011

Thursday, May 26, 2011

Friday, February 4, 2011

Saturday, November 13, 2010

Wednesday, August 25, 2010

Monday, July 5, 2010

Friday, June 25, 2010

Monday, June 21, 2010

Friday, March 19, 2010

Wednesday, September 9, 2009

Tuesday, August 18, 2009

Tuesday, July 28, 2009