Monday, March 23, 2015

Hadoop Scalability Challenges

Hadoop is hot, not because it necessarily represents cutting edge technology, but because it's being rapidly adopted by more and more companies as a solution for engaging in the big data trend. It may be coming to your company sooner than you think.

The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.

However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:

  1. So called flat development scalability
  2. Super scaling performance
  3. New TPC big data benchmark
Therefore, I've added a new module on Hadoop performance and capacity management to the Guerrilla Capacity Planning course material that also includes such topics as:
  • There are only 3 performance metrics you need to know
  • How performance metrics are related to one another
  • How to quantify scalability with the Universal Scalability Law
  • IT Infrastructure Library (ITIL) for Guerrillas
  • The Virtualization Spectrum from hyperthreads to hyperservices
  • Hadoop performance and capacity management
The course outline has more details.

Early bird registration ends in 5 days.

I'm also interested in hearing from anyone who plans to adopt Hadoop or has experience using it from a performance and capacity perspective.

Friday, March 20, 2015

Performance Analysis vs. Capacity Planning

This question came up in a (members only) Linkedin discussion group:
Often found a misconception about these terms. I'm sure this must be written in a book, but for informal discussions is always preferable to cite sources from standardization institutes or IT industry referents.

Thanks in advance
Gian Piero

Here's how I answered it.
This is a very good question that few people ever ask, let alone try to answer correctly.

Don't quote me on this but, I view it as the difference between how long vs. how much. ;)

Most people who proffer an answer will tend to incorporate a lot of details that reflect their own history with the subject. In my classes, I try to boil it down to fundamentals that can then be elaborated on with the specifics related to your particular context.

  1. Performance analysis or performance management is fundamentally about time: how long does it take? (BTW, thruput is just an inverse-time metric.)
  2. Capacity planning or capacity management is fundamentally about size: how much resource is needed?
To make things a little more concrete, consider a freeway. The number of lanes (and length between ramps) represents capacity (bandwidth). The unstated assumption is that the freeway has enough capacity to allow the traffic to travel in the shortest time or near the speed limit (throughput), i.e., maximal performance. Of course, in California we know all about that ruse. At peak traffic hours the freeway often approximates a parking lot.

The point is that performance and capacity are intimately related: how much resource is available to achieve a specified performance goal or service level at a given load (like traffic)? The main reason we consider any distinction at all is mostly one of perspective.

  • If you're coming at it from a capacity management standpoint, you're usually assessing/measuring capacity under a set of assumptions about performance (current or projected).

  • If you're coming at it from a performance management standpoint, you're assessing/measuring performance under a set of assumptions about capacity.
The other important point to stress is that the relationship between cap and perf metrics is generally nonlinear, e.g., the relationship between response time and resource utilization (an oft used proxy for size) is nonlinear—although it can look linear at low loads. That's what makes the subject both interesting and difficult. And, as I say in the epigram to the 1st edition of my Perl::PDQ book: Common sense is the pitfall of all performance analysis.

To go back to the freeway example, the usual "solution" to the parking-lot effect is to simply add more capacity, in the form of more freeways, which we already know doesn't work because adding more freeways just creates more cars! Another unintuitive relationship. Mainframers call this unexpected capacity consumption latent demand.

Beyond that, it's all about trade-offs; including meeting budgetary constraints and so forth.


Postscript:
Doctor Gunther, it's hard not to quote your opinion if we consider that your book: Guerrilla Capacity Planing was one of the first that I read as an introduction to the topic of IT capacity planning.

I appreciate your clear explanation.

GP

Monday, March 9, 2015

Guerrilla Training: New Location

Finally! We have a new location for our Guerrilla training classes in Pleasanton, California: Sheraton Four Points.

We had some complaints last year about noise from the car parks of surrounding restaurants during the night at the previous location. Four Points is much more secluded. It also has its, own restaurant, which some of you will recognize if you've attended previous Guerrilla classes (more than likely, we did lunch and/or dinner there).

The current 2015 schedule and registration is now available. The classroom is intimate and only holds about 10-12 people, so book early, book often.

Monday, October 6, 2014

Tactical Capacity Management for Sysadmins at LISA14

On November 9th I'll be presenting a full-day tutorial on performance analysis and capacity planning at the USENIX Large Scale System Administration (LISA) conference in Seattle, WA.

The registration code is S4 in System Engineering section.

Hope to see you there.

Wednesday, August 13, 2014

Intel TSX Multicore Scalability in the Wild

Multicore processors were introduced to an unsuspecting marketplace more than a decade ago, but really became mainstream circa 2005. Multicore was presented as the next big thing in microprocessor technology. No mention of falling off the Moore's law (uniprocessor) curve. A 2007 PR event—held jointly between Intel, IBM and AMD—announced a VLSI fabrication technology that broke through the 65 nm barrier. The high-κ Hafnium gate enabled building smaller transistors at 45 nm feature size and thus, more cores per die. I tracked and analyzed the repercussions of that event in these 2007 blog posts:
  1. Moore's Law II: More or Less?
  2. More on Moore
  3. Programming Multicores Ain't Easy

Intel met or beat their projected availability schedule (depending on how you count) for Penryn by essentially rebooting their foundries. Very impressive.

In my Guerrilla classes I like to pose the question: Can you procure a 10 GHz microprocessor? On thinking about it (usually for the first time), most people begin to realize that they can't, but they don't know why not. Clearly, clock frequency limitations have an impact on both performance and server-side capacity. Then, I like to point out that programming multicores (since that decision has already been made for you) is much harder than it is for uniprocessors. Moreover, there is not too much in the way of help from compilers and other development tools, at the moment, although that situation will continually improve, presumably. Intel TSX (Transactional Synchronization Extensions) for Haswell multicores offers assistance of that type at the hardware level. In particular, TSX instructions are built into Haswell cores to boost the performance and scalability of certain types of multithreaded applications. But more about that in a minute.

I also like to point out that Intel and other microprocessor vendors (of which there are fewer and fewer due the enormous cost of entry), have little interest in how well your database, web site, or commercial application runs on their multicores. Rather, their goal is to produce the cheapest chip for the largest commodity market, e.g., PCs, laptops, and more recently mobile. Since that's where the profits are, the emphasis is on simplest design, not best design.

Fast, cheap, reliable: pick two.
Server-side performance is usually relegated to low man on the totem pole because of its relatively smaller market share. The implicit notion is that if you want more performance, just add more cores. But that depends on the threadedness of the applications running on those cores. Of course, there can also be side benefits, such as inheriting lower power servers from advances in mobile chip technology.

Intel officially announced multicore processors based on the Haswell architecture in 2013. Because scalability analysis can reveal a lot about limitations of the architecture, it's generally difficult to come across any quantitative data in the public domain. In their 2012 marketing build up, however, Intel showed some qualitative scalability characteristics of the Haswell multicore with TSX. See figure above. You can take it as read that these plots are based on actual measurements.

Most significantly, note the classic USL scaling profiles of transaction throughput vs. number of threads. For example, going from coarse-grain locking without TSX (red curve exhibiting retrograde throughput) to coarse-grain locking with TSX (green curve) has reduced the amount of contention (i.e., USL α coefficient). It's hard to say what is the impact of TSX on coherency delay (i.e., USL β coefficient) without being in possession of the actual data. As expected, however, the impact of TSX on fine-grain locking seems to be far more moderate. A 2012 AnandTech review summed things up this way:

TSX will be supported by GCC v4.8, Microsoft's latest Visual Studio 2012, and of course Intel's C compiler v13. GLIBC support (rtm-2.17 branch) is also available. So it looks like the software ecosystem is ready for TSX. The coolest thing about TSX (especially HLE) is that it enables good scaling on our current multi-core CPUs without an enormous investment of time in the fine tuning of locks. In other words, it can give developers "fined grained performance at coarse grained effort" as Intel likes to put it.

In theory, most application developers will not even have to change their code besides linking to a TSX enabled library. Time will tell if unlocking good multi-core scaling will be that easy in most cases. If everything goes according to Intel's plan, TSX could enable a much wider variety of software to take advantage of the steadily increasing core counts inside our servers, desktops, and portables.

With claimed clock frequencies of 4.6 GHz (i.e., nominal 5000 MIPS), Haswell with TSX offers superior performance at the usual price point. That's two. What about reliability? Ah, there's the rub. TSX has been disabled in the current manufacturing schedule due to a design bug.

Wednesday, July 30, 2014

A Little Triplet

Little's law appears in various guises in performance analysis. It was known to Agner Erlang (the father of queueing theory) in 1909 to be intuitively correct but was not proven mathematically until 1961 by John Little. Even though you experience it all the time, queueing is not such a trivial phenomenon as it may seem. In the subsequent discussion, I'll show you that there is actually a triplet of such laws, where each version refers to a slightly different aspect of queueing. Although they have a common general form, the less than obvious interpretation of each version is handy to know for solving almost any problem in performance analysis.

To see the Little's law triplet, consider the line of customers at the grocery store checkout lane shown in Figure 1. Following the usual queueing theory convention, the queue includes not only the customers waiting but also the customer currently in service.

Figure 1. Checkout lane decomposed into its space and time components

As an aside, it is useful to keep in mind that there are only three types of performance metric:

  1. Time $T$ (the fundamental performance metric), e.g., minutes
  2. Count or a number $N$ (no formal dimensions), e.g., transactions
  3. Rate $N/T$ (inverse time dimension), e.g., transactions per minute
From this standpoint, Little's law has the simple general form \begin{equation} N = \dfrac{N}{T} ~\times ~T \label{eqn:dims} \end{equation} which says: a rate (type- C metric) multiplied by time (type- A metric) produces a number (type- B metric), because the $T$s cancel out. The three metric types (A,B,C) should not be confused with the three forms of Little's law being discussed here. That's just a coincidence because good things often come in threes. :-)

Sunday, July 20, 2014

Continuous Integration Gets Performance Testing Radar

As companies embrace continuous integration (CI) and fast release cycles, a serious problem has emerged in the development pipeline: Conventional performance testing is the new bottleneck. Every load testing environment is actually a highly complex simulation assumed to be a model of the intended production environment. System performance testing is so complex that the cost of modifying test scripts and hardware has become a liability for meeting CI schedules. One reaction to this situation is to treat performance testing as a checkbox item, but that exposes the new application to unknown performance idiosyncrasies in production.

In this webinar, Neil Gunther (Performance Dynamics Company) and Sai Subramanian (Cognizant Technology Solutions) will present a new type of model that is not a simulation, but instead acts like continuous radar that warns developers of potential performance and scalability issues during the CI process. This radar model corresponds to a virtual testing framework that precludes the need for developing performance test scripts or setting up a separate load testing environment. Far from being a mere idea, radar methodology is based on a strong analytic foundation that will be demonstrated by examining a successful case study.

Broadcast Date and Time: Tuesday, July 22, 2014, at 11 am Pacific