The Pith of Performance: GCaP

Showing posts with label GCaP. Show all posts

Monday, June 25, 2018

Guerrilla 2018 Classes Now Open

All Guerrilla training classes are now open for registration.

GCAP: Guerrilla Capacity and Performance — From Counters to Containers and Clouds
GDAT: Guerrilla Data Analytics — Everything from Linear Regression to Machine Learning
PDQW: Pretty Damn Quick Workshop — Personal tuition for performance and capacity mgmt

The following highlights indicate the kind of thing you'll learn. Most especially, how to make better use of all that monitoring and load-testing data you keep collecting.

How to save millions of dollars with a one-line performance model (video)
How to minimize chargeback after you lift and shift to the cloud (video)
How to correctly emulate web traffic on a load-testing rig (PDF)

See what Guerrilla grads are saying about these classes. And how many instructors do you know that are available for you from 9am to 9pm (or later) each day of your class?

Who should attend?

IT architects
Application developers
Performance engineers
Sysadmins (Linux, Unix, Windows)
System engineers
Test engineers
Mainframe sysops (IBM. Hitachi, Fujitsu, Unisys)
Database admins
Devops practitioners
SRE engineers
Anyone interested in getting beyond performance monitoring

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. The room-booking link is on the registration page.

Tell a colleague and see you in September!

Wednesday, June 20, 2018

Chargeback in the Cloud - The Movie

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

Wednesday, February 21, 2018

CPU Idle Is Not Like White Space

This post seems like it ought to be too trite to write but, I see the following performance gotcha cropping up over and over again.

Under pressure to consolidate resources, usually driven by management and especially regarding processor capacity, there is often an urge to "use up" any idle processor cycles. Idle processor capacity tends to be viewed like it's whitespace on a written page—just begging to be filled up.

The logical equivalent of filling up the "whitespace" is absorbing idle processor capacity by migrating applications that are currently running on other servers and turning those excess servers off or using them for something else.

Guerrilla Training September 2017

This year offers a new 3-day class on applying PDQ in your workplace. The classic Guerrilla classes, GCAP and GDAT, are also being presented.

Who should attend?

architects
application developers
performance engineers
sys admins
test engineers
mainframe sysops
database admins
webops
anyone interested in getting beyond ad hoc performance analysis and capacity planning skills

Some course highlights:

There are only 3 performance metrics you need to know
How to quantify scalability with the Universal Scalability Law
Hadoop performance and capacity management
Virtualization Spectrum from hyper-threads to cloud services
How to detect bad data
Statistical forecasting techniques
Machine learning algorithms applied to performance data

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. Link is on the registration page.

Also see what graduates are saying about these classes.

Tell a colleague and see you in September!

Wednesday, June 8, 2016

2016 Guerrilla Training Schedule 2016

After a six month hiatus working on a major consulting gig, Guerrilla training classes are back in business with the three classic courses: Guerrilla Bootcamp (GBOOT), Guerrilla Capacity Planning (GCAP) and Guerrilla Data Analysis Techniques (GDAT).

See what graduates are saying about these courses.

Some course highlights:

There are only 3 performance metrics you need to know
How to quantify scalability with the Universal Scalability Law
Hadoop performance and capacity management
Virtualization Spectrum from hyper-threads to cloud services
How to detect bad data
Statistical forecasting techniques
Machine learning algorithms applied to performance data

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate.

Tell a friend and see you in September!

Sunday, July 26, 2015

Next GCaP Class: September 21, 2015

The next Guerrilla Capacity Planning class will be held during the week of September 21, 2015 at our new Sheraton Four Points location in Pleastaton, California. Early bird rate ends August 21st.

During the class, I will bust some entrenched CaP management myths (in no particular order):

All performance measurements are wrong by definition.
There is no response-time knee.
Throughput is not the same as execution rate.
Throughput and latency metrics are related — nonlinearly.
There is no parallel computing.

No particular knowledge about capacity and performance management is assumed.

Attendees should bring their laptops as course materials are provided on CD or flash drive. The Sheraton provides free wi-fi to the internet.

We look forward to seeing you there!

Monday, March 23, 2015

Hadoop Scalability Challenges

Hadoop is hot, not because it necessarily represents cutting edge technology, but because it's being rapidly adopted by more and more companies as a solution for engaging in the big data trend. It may be coming to your company sooner than you think.

The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.

However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:

So called flat development scalability
Super scaling performance
New TPC big data benchmark

See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.

Performance Analysis vs. Capacity Planning

This question came up in a (members only) Linkedin discussion group:

Often found a misconception about these terms. I'm sure this must be written in a book, but for informal discussions is always preferable to cite sources from standardization institutes or IT industry referents.
Thanks in advance

Gian Piero

Here's how I answered it.

Guerrilla Training: New Location

Finally! We have a new location for our Guerrilla training classes in Pleasanton, California: Sheraton Four Points.

We had some complaints last year about noise from the car parks of surrounding restaurants during the night at the previous location. Four Points is much more secluded. It also has its, own restaurant, which some of you will recognize if you've attended previous Guerrilla classes (more than likely, we did lunch and/or dinner there).

The current 2015 schedule and registration page is now posted. The classroom is intimate and only holds about 10-12 people, so book early, book often.

Wednesday, August 13, 2014

Intel TSX Multicore Scalability in the Wild

Multicore processors were introduced to an unsuspecting marketplace more than a decade ago, but really became mainstream circa 2005. Multicore was presented as the next big thing in microprocessor technology. No mention of falling off the Moore's law (uniprocessor) curve. A 2007 PR event—held jointly between Intel, IBM and AMD—announced a VLSI fabrication technology that broke through the 65 nm barrier. The high-κ Hafnium gate enabled building smaller transistors at 45 nm feature size and thus, more cores per die. I tracked and analyzed the repercussions of that event in these 2007 blog posts:

Intel met or beat their projected availability schedule (depending on how you count) for Penryn by essentially rebooting their foundries. Very impressive.

In my Guerrilla classes I like to pose the question: Can you procure a 10 GHz microprocessor? On thinking about it (usually for the first time), most people begin to realize that they can't, but they don't know why not. Clearly, clock frequency limitations have an impact on both performance and server-side capacity. Then, I like to point out that programming multicores (since that decision has already been made for you) is much harder than it is for uniprocessors. Moreover, there is not too much in the way of help from compilers and other development tools, at the moment, although that situation will continually improve, presumably. Intel TSX (Transactional Synchronization Extensions) for Haswell multicores offers assistance of that type at the hardware level. In particular, TSX instructions are built into Haswell cores to boost the performance and scalability of certain types of multithreaded applications. But more about that in a minute.

I also like to point out that Intel and other microprocessor vendors (of which there are fewer and fewer due the enormous cost of entry), have little interest in how well your database, web site, or commercial application runs on their multicores. Rather, their goal is to produce the cheapest chip for the largest commodity market, e.g., PCs, laptops, and more recently mobile. Since that's where the profits are, the emphasis is on simplest design, not best design.

Fast, cheap, reliable: pick two.

Server-side performance is usually relegated to low man on the totem pole because of its relatively smaller market share. The implicit notion is that if you want more performance, just add more cores. But that depends on the threadedness of the applications running on those cores. Of course, there can also be side benefits, such as inheriting lower power servers from advances in mobile chip technology.

Source: AnandTech, Inc.

Intel officially announced multicore processors based on the Haswell architecture in 2013. Because scalability analysis can reveal a lot about limitations of the architecture, it's generally difficult to come across any quantitative data in the public domain. In their 2012 marketing build up, however, Intel showed some qualitative scalability characteristics of the Haswell multicore with TSX. See figure above. You can take it as read that these plots are based on actual measurements.

Most significantly, note the classic USL scaling profiles of transaction throughput vs. number of threads. For example, going from coarse-grain locking without TSX (red curve exhibiting retrograde throughput) to coarse-grain locking with TSX (green curve) has reduced the amount of contention (i.e., USL α coefficient). It's hard to say what is the impact of TSX on coherency delay (i.e., USL β coefficient) without being in possession of the actual data. As expected, however, the impact of TSX on fine-grain locking seems to be far more moderate. A 2012 AnandTech review summed things up this way:

TSX will be supported by GCC v4.8, Microsoft's latest Visual Studio 2012, and of course Intel's C compiler v13. GLIBC support (rtm-2.17 branch) is also available. So it looks like the software ecosystem is ready for TSX. The coolest thing about TSX (especially HLE) is that it enables good scaling on our current multi-core CPUs without an enormous investment of time in the fine tuning of locks. In other words, it can give developers "fined grained performance at coarse grained effort" as Intel likes to put it.
In theory, most application developers will not even have to change their code besides linking to a TSX enabled library. Time will tell if unlocking good multi-core scaling will be that easy in most cases. If everything goes according to Intel's plan, TSX could enable a much wider variety of software to take advantage of the steadily increasing core counts inside our servers, desktops, and portables.

With claimed clock frequencies of 4.6 GHz (i.e., nominal 5000 MIPS), Haswell with TSX offers superior performance at the usual price point. That's two. What about reliability? Ah, there's the rub. TSX has been disabled in the current manufacturing schedule due to a design bug.

Wednesday, July 30, 2014

A Little Triplet

Little's law appears in various guises in performance analysis. It was known to Agner Erlang (the father of queueing theory) in 1909 to be intuitively correct but was not proven mathematically until 1961 by John Little. Even though you experience it all the time, queueing is not such a trivial phenomenon as it may seem. In the subsequent discussion, I'll show you that there is actually a triplet of such laws, where each version refers to a slightly different aspect of queueing. Although they have a common general form, the less than obvious interpretation of each version is handy to know for solving almost any problem in performance analysis.

To see the Little's law triplet, consider the line of customers at the grocery store checkout lane shown in Figure 1. Following the usual queueing theory convention, the queue includes not only the customers waiting but also the customer currently in service.

Figure 1. Checkout lane decomposed into its space and time components

As an aside, it is useful to keep in mind that there are only three types of performance metric:

Time $T$ (the fundamental performance metric), e.g., minutes
Count or a number $N$ (no formal dimensions), e.g., transactions
Rate $N/T$ (inverse time dimension), e.g., transactions per minute

Restaurant Performance Sunk by Selfies

An interesting story appeared last weekend about a popular NYC restaurant realizing that, although the number of customers they served on a daily basis is about the same today as it was ten years ago, the overall service had slowed significantly. Naturally, this situation has led to poor online reviews to the point where the restaurant actually hired a firm to investigate the problem. The analysis of surveillance tapes led to a surprising conclusion. The unexpected culprit behind the slowdown was neither the kitchen staff nor the waiters, but customers taking photos and otherwise playing around with their smartphones.

Using the data supplied in the story, I wanted to see how the restaurant performance would look when expressed as a PDQ model. First, I created a summary data frame in R, based on the observed times:


> df
           obs.2004 obs.2014
wifi.data         0        5
menu.data         8        8
menu.pix          0       13
order.data        6        6
eat.mins         46       43
eat.pix           0       20
paymt.data        5        5
paymt.pix         0       15
total.mins       65      115

The 2004 situation can be represented schematically by the following queueing network

Figure 1. 2004 queueing stages

Referring to Figure 1:

How to Remember the Poisson Distribution

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 values are


> ppois(0:15,4)
 [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511

As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

Load Average in FreeBSD

In my Perl PDQ book there is a chapter, entitled Linux Load Average, where I dissect how the load average metric (or metrics, since there are three reported numbers) is computed in shell commands like uptime, viz.,

[njg]~/Desktop% uptime
16:18  up 9 days, 15 mins, 4 users, load averages: 2.11 1.99 1.99

For the book, I used Linux 2.6 source because it was accessible on the web with convenient hyperlinks to navigate the code. Somewhere in the kernel scheduler, the following C code appeared:


#define FSHIFT    11          /* nr of bits of precision */ 
#define FIXED_1   (1<<FSHIFT) /* 1.0 as fixed-point */ 
#define LOAD_FREQ (5*HZ)      /* 5 sec intervals */ 
#define EXP_1     1884        /* 1/exp(5sec/1min) fixed-pt */ 
#define EXP_5     2014        /* 1/exp(5sec/5min) */ 
#define EXP_15    2037        /* 1/exp(5sec/15min) */ 

#define CALC_LOAD(load,exp,n) \
 load *= exp; \
 load += n*(FIXED_1-exp); \
 load >>= FSHIFT;

where the C macro CALC_LOAD computes the following equation

The Visual Connection Between Capacity And Performance

Whether or not computer system performance and capacity are related is a question that comes up from time to time, especially from those with little experience in either discipline. Most recently, it appeared on a Linked-in discussion group:

"...the topic was raised about the notion that we are Capacity Management not Performance Management. It made me think about whether performance is indeed a facet of Capacity, or if it belongs completely separate."

As a matter of course, I address this question in my Guerrilla training classes. There, I like to appeal to a simple example—a multiserver queue—to exhibit how the performance characteristics are intimately related to system capacity. Not only are they related but, as the multiserver queue illustrates, the relationship is nonlinear. In terms of daily operations, however, you may choose to focus on one aspect more than the other, but they are still related nonetheless.

Performance of N+1 Redundancy

How can you determine the performance impact on SLAs after an N+1 redundant hosting configuration fails over? This question came up in the Guerrilla Capacity Planning class, this week. It can be addressed by referring to a multi-server queueing model.

N+1 = 4 Redundancy

We begin by considering a small-N configuration of four hosts where the load is distributed equally to each of the hosts. For simplicity, the load distribution is assumed to be performed by some kind of load balancer with a buffer. The idea of N+1 redundancy is that the load balancer ensures all four hosts are equally utilized prior to any failover.

The idea is that none of the hosts should use more than 75% of their available capacity: the blue areas on the left side of Fig. 1. The total consumed capacity is assumed to be $4 \times 3/4 = 3$ or 300% of the total host configuration (rather than all 4 hosts or 400% capacity). Then, when any single host fails, its lost capacity is compensated by redistributing that same load across the remaining three available hosts (each running 100% busy after failover). As we shall show in the next section, this is a misconception.

Figure 1. N+1 redundant hosts before (left) and after failover (right)

The circles in Fig. 1 represent hosts and rectangles represent incoming requests buffered at the load-balancer. The blue area in the circles signifies the available capacity of a host, whereas white signifies unavailable capacity. When one of the hosts fails, its load must be redistributed across the remaining three hosts. What Fig. 1 doesn't show is the performance impact of this capacity redistribution.

Guerrilla Classes in March 2014

The upcoming GBoot and GCaP training classes are your fast track to enterprise performance and capacity management. You can now register entirely online using either your corporate or personal credit card.

New topics include:

The effect of think-time settings in load tests
Closed vs. open workloads in load testing

Classic topics include:

There are only 3 performance metrics
How performance metrics are related to each another
How to quantify scalability with the Universal Scalability Law (USL)
IT Infrastructure Library (ITIL) for Guerrillas
The Virtualization Spectrum from hyperthreads to hyperservices

Entrance Larkspur Landing hotel Pleasanton California

As usual, all classes are held at our lovely Larkspur Landing Pleasanton location in California. Attendees should bring their laptops to the class as course materials are provided on a flash drive. Larkspur Landing also provides free wi-fi Internet in their residence-style rooms as well as the training room.

Wednesday, June 19, 2013

Capacity Planning Classes in August 2013

Bookings are open for both Guerrilla Boot Camp (GBoot) and Guerrilla Capacity Planning (GCaP) classes in August 2013 at the Early Bird rate.

As usual, classes will be held at our lovely Larkspur Landing location. Click on the image for booking information. Here are some comments contributed by Guerrilla alumni.

Attendees should bring their laptops, as course materials are provided on CD or flash drive. The venue also offers free wi-fi to the internet.

Monday, April 22, 2013

Adding Percentiles to PDQ

Pretty Damn Quick (PDQ) performs a mean value analysis of queueing network models: mean values in; mean values out. By mean, I mean statistical mean or average. Mean input values include such queueing metrics as service times and arrival rates. These could be sample means. Mean output values include such queueing metrics as waiting time and queue length. These are computed means based on a known distribution. I'll say more about exactly what distribution, shortly. Sometimes you might also want to report measures of dispersion about those mean values, e.g., the 90th or 95th percentiles.

Percentile Rules of Thumb

In The Practical Performance Analyst (1998, 2000) and Analyzing Computer System Performance with Perl::PDQ (2011), I offer the following Guerrilla rules of thumb for percentiles, based on a mean residence time R:

80th percentile: p80 ≃ 5R/3
90th percentile: p90 ≃ 7R/3
95th percentile: p95 ≃ 9R/3

I could also add the 50th percentile or median: p50 ≃ 2R/3, which I hadn't thought of until I was putting this blog post together.

Visualizing Variance

The typical presentation of variance in textbooks often looks like this Wikipedia definition. Quite daunting for the non-expert. So, how would you explain the notion of variance to someone who has little or no background in statistics and couldn't easily digest all that gobbledygook?

The Mean

Let's drop back a notch. How would you explain the statistical mean? A common way to do that is to utilize the simple visual device of the "bell curve" belonging to the normal distribution (Fig. 1).

Figure 1. A normal distribution

The normal distribution, $N(x,\mu,\sigma^2)$, is specified by two parameters:

Mean, usually denoted by $\mu$
Variance, usually denoted by $\sigma^2$

that determine (1) the location and (2) the shape of the curve. In Fig. 1, $\mu = 4$. Being a probability, the curve must be normalized to enclose unit area. Also, since $N(x)$ is unimodal and symmetric about $\mu$, the mean, median and mode are all located at the same position on the $x$-axis. Therefore, it's easy to point to the mean as being the $x$-position of the peak. Anybody can see that immediately. Mission accomplished.

But what about the variance? Where is that in Figure 1?

Monday, June 25, 2018

Wednesday, June 20, 2018

Wednesday, February 21, 2018

Friday, August 4, 2017

Wednesday, June 8, 2016

Sunday, July 26, 2015

Monday, March 23, 2015

Friday, March 20, 2015

Monday, March 9, 2015

Wednesday, August 13, 2014

Wednesday, July 30, 2014

Thursday, July 17, 2014

Thursday, July 3, 2014

Thursday, June 26, 2014

Friday, June 6, 2014

Thursday, March 13, 2014

N+1 = 4 Redundancy

Tuesday, February 18, 2014

Wednesday, June 19, 2013

Monday, April 22, 2013

Percentile Rules of Thumb

Sunday, January 6, 2013

The Mean