The Pith of Performance: load test

Showing posts with label load test. Show all posts

Monday, June 25, 2018

Guerrilla 2018 Classes Now Open

All Guerrilla training classes are now open for registration.

GCAP: Guerrilla Capacity and Performance — From Counters to Containers and Clouds
GDAT: Guerrilla Data Analytics — Everything from Linear Regression to Machine Learning
PDQW: Pretty Damn Quick Workshop — Personal tuition for performance and capacity mgmt

The following highlights indicate the kind of thing you'll learn. Most especially, how to make better use of all that monitoring and load-testing data you keep collecting.

How to save millions of dollars with a one-line performance model (video)
How to minimize chargeback after you lift and shift to the cloud (video)
How to correctly emulate web traffic on a load-testing rig (PDF)

See what Guerrilla grads are saying about these classes. And how many instructors do you know that are available for you from 9am to 9pm (or later) each day of your class?

Who should attend?

IT architects
Application developers
Performance engineers
Sysadmins (Linux, Unix, Windows)
System engineers
Test engineers
Mainframe sysops (IBM. Hitachi, Fujitsu, Unisys)
Database admins
Devops practitioners
SRE engineers
Anyone interested in getting beyond performance monitoring

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. The room-booking link is on the registration page.

Tell a colleague and see you in September!

Sunday, May 20, 2018

USL Scalability Modeling with Three Parameters

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.

The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:

providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
satisfying the von Neumann criterion: minimal number of modeling parameters

Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better.

The Geometry of Latency

... AKA hyperbolae.

Here's a mnemonic tabulation based on dishes and bowls:

Hopefully this makes amends for the more complicated explanation I wrote for CMG back in 2009 entitled: "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ", which I'm pretty sure almost nobody understood.

Saturday, October 8, 2016

Crib Sheet for Emulating Web Traffic

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools (PDF) is now available online and will be presented at the upcoming CMG conference in November.

Presenter: James Brady (co-author: Neil Gunther)
Session Number: 436
Subject Area: APM
Session Date: Wed, November 9, 2016
Session Time: 1:00 PM - 2:00 PM
Session Room: PortofinoB

How to Emulate Web Traffic Using Standard Load Testing Tools

The following abstract has been submitted to CMG 2016:

How to Emulate Web Traffic Using Standard Load Testing Tools
James Brady (State of Nevada) and Neil Gunther (Performance Dynamics)
Conventional load-testing tools are based on a fifty year old time-share computer paradigm where a finite number of users submit requests and respond in a synchronized fashion. Conversely, modern web traffic is essentially asynchronous and driven by an unknown number of users. This difference presents a conundrum for testing the performance of modern web applications. Even when the difference is recognized, performance engineers often introduce virtual-user script modifications based on hearsay; much of which leads to wrong results. We present a coherent methodology for emulating web traffic that can be applied to existing test tools.

Keywords: load testing, workload simulation, web applications, software performance engineering, performance modeling

Wednesday, July 29, 2015

Hockey Elbow and Other Response Time Injuries

You've heard of tennis elbow. Well, there's a non-sports, performance injury that I like to call hockey elbow. An example of such an "injury" is shown in Figure 1, which appeared in a recent computer performance analysis presentation. It's a reminder of how easy it is to become complacent when doing performance analysis and possibly end up reaching the wrong conclusion.

Figure 1. injured response time performance

Figure 1 is seriously flawed for two reasons:

It incorrectly shows the response time curve with a vertical asymptote.
It compounds the first error by employing a logarithmic x-axis.

Characterizing Performance Bottlenecks

If you do a Google search using keywords like: performance, bottleneck, analysis, you get quite a bewildering list of responses, and none of them seems to clearly define what they mean by the term bottleneck.^†

The word bottleneck refers to a choke point or narrowing, literally like the neck of a bottle, that causes the flow to take longer than it would otherwise. The effect on performance is commonly seen on the freeway in an area undergoing roadwork. Multiple lanes of traffic are forced to converge into a single lane and proceed past the roadwork in single file. Going from parallel traffic flow to serial flow means the same number of cars will take longer to get through that same section of road. As we all know, the delay at a freeway bottleneck can be very significant.

The same is true on a single-lane country road. If you come to a section where roadwork slows down every car, it takes longer to traverse that section of the road. Bottlenecks are synonymous with slow downs and delays, but they really determine a lot more than delay.

Load Testing with Uniform vs. Exponential Arrivals

In a couple of recent blog posts about generating exponential loads and why that is important for load testing and performance testing, it was not completely clear to some readers what was motivating my remarks. In this post, I will try to provide a more visual elaboration of that aspect.

My fundamental point is this. When it comes to load testing^*, presumably the idea is to exercise the system under test (SUT). Otherwise, why are you doing it? Part of exercising the SUT is to produce significant fluctuations in the number of requests residing in application buffers. Those fluctuations can be induced by the pattern of arriving requests issued by the client-side driver (DVR): usually implemented as a pile of PCs or blades.

How to Generate Exponential Delays

This question arose while addressing Comments on a previous blog post about exponentially distributed delays. One of my ongoing complaints is that many, if not most, popular load-test generation tools do not provide exponential variates as part of a library of time delays or think-time distributions. Not only is this situation bizarre, given that all load tests are actually performance models (and who doesn't love an exponential distribution in their performance models?), but without the exponential distribution you are less likely to observe such things as buffer overflow conditons due to larger than normal (or uniform) queueing fluctuations. Exponential delays are both simple and useful for that purpose, but we are often left to roll our own code and then debug it.

Throughput-Delay Curves

A colleague of mine at Yahoo.com asked me if I'd ever seen curves like this:

Not only is the answer, yes (it's a throughput-delay plot or XR plot in my notation), but that particular plot comes from my GCaP course notes. There, I use it to analyze the comparative performance of a functional multiprocessor (NS6000) and a symmetric multiprocessor (SC2000). Note how the two curves cross at around 1500 OPS. You can ask yourself why and if you can't come up with an explanation, you should be registering for a Guerrilla class. :)

The above XR plot also serves as a useful reminder that the throughput and response-time metrics are not only dependent on one another, but they are generally dependent in a nonlinear way—despite what some experts may claim:

Webinar: Load Testing Meets Data Analytics

This Thursday, October 27 at 10 am PDT^*, I'll be participating in a webinar sponsored by SOASTA, Inc. They make a new breed of load-testing product called CloudTest® which, despite its name, is not restricted to load testing cloud-based apps, although it can do that too.

USL Fine Point: Sub-Amdahl Scalability

As discussed in Chapter 4 of my GCaP book, Amdahl's law is defined by a single parameter called the serial fraction, denoted by the symbol α and signifying the proportion of the total workload (W) that is serialized during execution. From the standpoint of parallel processing (where reference to Amdahl's law is most frequent) serialization means that portion of the workload can only execute on a single processor out of N parallel processors. The parallel speedup or relative capacity C_A(N) performance metric is given by: \begin{equation} C_A(N) = \frac{N}{1 + \alpha \, (N-1)} \end{equation} If there is no serialization in the workload, i.e., α = 0, then C_A(N) = N, which signifies that the workload scales linearly with the number of physical processors. The important observation made by Gene Amdahl (more than 40 years ago) is that even if α is relatively small, viz., a few percent of the execution time, scalability cannot continue to increase linearly. For example, if α = 5%, then C_A(N) will eventually reach a scalability ceiling given by 20 effective processors (1/α), even if there are hundreds of physical processors available in the system.

Load Testing Think Time Distributions

One of my gripes about some commercial load testing tools is that they only provide a think time distribution (Z) that is equivalent to uniform variates in the client-script. If you want some other distribution, you have to code it and debug it yourself. Load test generators are essentially very expensive workload simulators; especially when you take into account the cost of the SUT platform. At those prices, a selection of distributions should be provided as a standard library—like they are in event-based simulators.

To make this point a bit clearer, I used the very convenient variate-generation functions in R to compare some of the distributions that I consider should be included in such a library for the convenience of workload-test designers and performance engineers. The statistical mean (i.e., the average think delay) is the same in all these plots and is shown as the red vertical line, but pay particular attention to the spread around the mean on the x-axis.

Emulating Web Traffic in Load Tests

One of the recurring questions in the GCaP class last week was: How can we make web-application load tests more representative of real Internet traffic? The sticking point is that conventional load-test simulators like LoadRunner, JMeter, and httperf, represent the load in terms of a finite number of virtual user (or vuser) scripts, whereas the Internet has an indeterminately large number of real users creating load.

Using Think Times to Determine Arrival Rates

This question came up at the NorCal CMG meeting last week. Hugh S. asked me: Is there is a relationship between the choice of think time (Z) in a load-test client script and the rate at which requests will arrive into the system under test? The answer is, yes, and it's easy to understand how by using the preceding blog post about mapping virtual users to real users.

Mapping Virtual Users to Real Users

In performance engineering scenarios that use commercial load testing tools, e.g., LoadRunner, the question often arises: How many virtual users (vusers) should be exercised in order to simulate by some expected number of real users? This is important, more often than not, because the requirement might be to simulate thousands or even tens of thousands of real users, but the stiff licensing fees associated with each vuser (above some small default number) makes that cost-prohibitive. As I intend to demonstrate here, we can apply Little's law to map vusers to real users.

A commonly used practical approach to ameliorate this circumstance is to run the load test scenarios with zero think time (i.e., Z = 0) in the client scripts on the driver (DVR) side of the test rig. This choice effectively increases the number of active transactions running on the system under test (SUT), which might include apps servers and database servers. These two subsystems are usually connected by a local area network, as shown in the following diagram.

Parallelism in PDQ

All so-called "analytic solvers" for queueing models, including PDQ, assume that the queueing system being modeled is in steady state. Steady state means that in the long run, the number of arrivals into a service facility, e.g., customers arriving at a grocery checkout, will be identical to the number of customers departing. Why is this important?

Response Time Knees and Queues

How do you determine where the response-time "knee" occurs? This is a question one commonly hears with reference to characterizing the performance of an application. Calculating where the response time suddenly begins to climb dramatically is considered, by many, to be an important determinant for such things as load testing, scalability analysis, and setting application service targets.

In a previous blog post, I pointed out that such a "knee" is actually an optical illusion. Nonetheless, this same question arose in last month's CMG MeasureIT, as a kind of survey entitled "Does the Knee in a Queuing Curve Exist or is it just a Myth?" Although that author concludes (correctly) that the existence of a "knee" (as it is usually meant) is bogus, the panoply of responses was quite astounding—especially coming from professionals who ought to know better. In this month's MeasureIT, I examine the same question in a rigorous but unconventional way under the title "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ."

Monday, June 1, 2009

Data + Models == Insight

Al Bundy, of the TV show Married with Children, understood it and performance engineers should too. What am I talking about? The theme music for that show is the tune "Love and Marriage" as sung by Frank Sinatra. Just like the song says about love and marriage, so it is with measurements and models ... You can't have one without the other.

Plotting PDQ Output with R

One the nice things about PDQ-R (coming in release 5.0) is the ability to plot PDQ output directly in R. Here's a PDQ-R script, together with the corresponding graphical output, that I knocked up to show the effect on the throughput curve of adding more queueing delay stages (K), with everything else held constant.