Wednesday, June 20, 2018

Chargeback in the Cloud - The Movie

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

Sunday, May 20, 2018

USL Scalability Modeling with Three Parameters

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of two fitting parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha (N - 1) + \beta N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

  1. normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
  2. equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
  3. if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.
The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:
  1. providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
  2. satisfying the von Neumann criterion: minimal number of modeling parameters
Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better.

Sunday, April 22, 2018

The Geometry of Latency

... AKA hyperbolae.

Here's a mnemonic tabulation based on dishes and bowls:

Hopefully this makes amends for the more complicated explanation I wrote for CMG back in 2009 entitled: "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ", which I'm pretty sure almost nobody understood.

Saturday, April 21, 2018

Virtual cloudXchange 2018 Conference

Our abstract has been accepted for presentation at the FREE cloudXchange online event to be held by CMG on June 19th at 10am Pacific (5pm UTC).

Exposing the Cost of Performance
Hidden in the Cloud


Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany

10am Pacific Time on June 19, 2018

Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.

Wednesday, March 14, 2018

WTF is Modeling, Anyway?

A conversation with performance and capacity management veteran Boris Zibitsker, on his BEZnext channel, about how to save multiple millions of dollars with a one-line performance model (at 21:50 minutes into the video) that has less than 5% error. I wish my PDQ models were that good. :/

The strength of the model turns out to be its explanatory power, rather than prediction, per se. However, with the correct explanation of the performance problem in hand (which also proved that all other guesses were wrong), this model correctly predicted a 300% reduction in application response time for essentially no cost. Modeling doesn't get much better than this.

Footnotes

  1. According to Computer World in 1999, a 32-node IBM SP2 cost $2 million to lease over 3 years. This SP2 cluster was about 6 times bigger.
  2. Because of my vain attempt to suppress details (in the interests of video length), Boris gets confused about the kind of files that are causing the performance problem (near 26:30 minutes). They're not regular data files and they're not executable files. The executable is already running but sometimes waits—for a long time. The question is, waits for what? They are, in fact, special font files that are requested by the X-windows application (the client, in X parlance). These remote files may also get cached, so it's complicated. In my GCAP class, I have more time to go into this level of detail. Despite all these potential complications, my 'log model' accurately predicts the mean application launch time.
  3. Log_2 assumes a binary tree organization of font files whereas, Log_10 assumes a denary tree.
  4. Question for the astute viewer. Since these geophysics applications were all developed in-house, how come the developers never saw the performance problems before they ever got into production? Here's a hint.
  5. Some ppl have asked why there's no video of me. This was the first time Boris had recorded video of a Skype session and he pushed the wrong button (or something). It's prolly better this way. :P

Wednesday, February 21, 2018

CPU Idle Is Not Like White Space

This post seems like it ought to be too trite to write but, I see the following performance gotcha cropping up over and over again.

Under pressure to consolidate resources, usually driven by management and especially regarding processor capacity, there is often an urge to "use up" any idle processor cycles. Idle processor capacity tends to be viewed like it's whitespace on a written page—just begging to be filled up.

The logical equivalent of filling up the "whitespace" is absorbing idle processor capacity by migrating applications that are currently running on other servers and turning those excess servers off or using them for something else.

Friday, August 4, 2017

Guerrilla Training September 2017

This year offers a new 3-day class on applying PDQ in your workplace. The classic Guerrilla classes, GCAP and GDAT, are also being presented.

Who should attend?

  • architects
  • application developers
  • performance engineers
  • sys admins
  • test engineers
  • mainframe sysops
  • database admins
  • webops
  • anyone interested in getting beyond ad hoc performance analysis and capacity planning skills

Some course highlights:

  • There are only 3 performance metrics you need to know
  • How to quantify scalability with the Universal Scalability Law
  • Hadoop performance and capacity management
  • Virtualization Spectrum from hyper-threads to cloud services
  • How to detect bad data
  • Statistical forecasting techniques
  • Machine learning algorithms applied to performance data

Register online. Early-bird discounts run through the end of August.

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. Link is on the registration page.

Also see what graduates are saying about these classes.

Tell a colleague and see you in September!