The Pith of Performance

Wednesday, February 19, 2014

Facebook Meets Florence Nightingale and Enrico Fermi

Highlighting Facebook's mistakes and weaknesses is a popular sport. When you're the 800 lb gorilla of social networking, it's inevitable. The most recent rendition of FB bashing appeared in a serious study entitled, Epidemiological Modeling of Online Social Network Dynamics, authored by a couple of academics in the Department of Mechanical and Aerospace Engineering (???) at Princeton University.

They use epidemiological models to explain adoption and abandonment of social networks, where user adoption is analogous to infection and user abandonment is analogous to recovery from disease, e.g., the precipitous attrition witnessed by MySpace. To this end, they employ variants of an SIR (Susceptible Infected Removed) model to predict a precipitous decline in Facebook activity in the next few years.

Channeling Mark Twain^†, FB engineers lampooned this conclusion by pointing out that Princeton would suffer a similar demise under the same assumptions.

Irrespective of the merits of the Princeton paper, I was impressed that they used an SIR model. It's the same one I used, in R, last year to reinterpret Florence Nightingale's zymotic disease data during the Crimean War as resulting from epidemic spreading.

Another way in which FB was inadvertently dinged by incorrect interpretation of information—this time it was the math—occurred in the 2010 movie, "The Social Network" that tells the story of how FB (then called Facemash) came into being. While watching the movie, I noticed that the ranking metric that gets written on a dorm window (only in Hollywood) is wrong! The correct ranking formula is analogous to the Fermi-Dirac distribution, which is key to understanding how electrons "rank" themselves in atoms and semiconductors.

^†"The reports of my death have been greatly exaggerated."

Tuesday, February 18, 2014

Guerrilla Classes in March 2014

The upcoming GBoot and GCaP training classes are your fast track to enterprise performance and capacity management. You can now register entirely online using either your corporate or personal credit card.

New topics include:

The effect of think-time settings in load tests
Closed vs. open workloads in load testing

Classic topics include:

There are only 3 performance metrics
How performance metrics are related to each another
How to quantify scalability with the Universal Scalability Law (USL)
IT Infrastructure Library (ITIL) for Guerrillas
The Virtualization Spectrum from hyperthreads to hyperservices

Entrance Larkspur Landing hotel Pleasanton California

As usual, all classes are held at our lovely Larkspur Landing Pleasanton location in California. Attendees should bring their laptops to the class as course materials are provided on a flash drive. Larkspur Landing also provides free wi-fi Internet in their residence-style rooms as well as the training room.

Thursday, January 2, 2014

Monitoring CPU Utilization Under Hyper-threading

The question of accurately measuring processor utilization with hyper-threading (HT) enabled came up recently in a Performance Engineering Group discussion on Linked-in. Since I spent some considerable time looking into this issue while writing my Guerrilla Capacity Planning book, I thought I'd repeat my response here (slightly edited for this blog), in case it's useful to a broader audience interested in performance and capacity management. Not much has changed, it seems.

In a nutshell, the original question concerned whether or not it was possible for a single core to be observed running at 200% busy, as reported by Linux top, when HT is enabled.

This question is an old canard (well, "old" for multicore technology). I call it the "Missing MIPS" paradox. Regarding the question, "Is it really possible for a single core to be 200% busy?" the short answer is: never! So, you are quite right to be highly suspicious and confused.
You don't say which make of processor is running on your hardware platform, but I'll guess Intel. Very briefly, the OS (Linux in your case) is being lied to. Each core has 2 registers where inbound threads are stored for processing. Intel calls these AS (Architectural State) registers. With HT *disabled*, the OS only sees a single AS register as being available. In that case, the mapping between state registers and cores is 1:1. The idea behind HT is to allow a different application thread to run when the currently running app stalls; due to branch misprediction, bubbles in the pipeline, etc. To make that possible, there has to be another port or AS register. That register becomes visible to the OS when HT is enabled. However, the OS (and all the way up the food chain to whatever perf tools you are using) now thinks twice the processor capacity is available, i.e., 100% CPU at each AS port.

Response Time Percentiles for Multi-server Applications

In a previous post, I applied my rules-of-thumb for response time (RT) percentiles (or more accurately, residence time in queueing theory parlance), viz., 80th percentile: $R_{80}$, 90th percentile: $R_{90}$ and 95th percentile: $R_{95}$ to a cellphone application and found that the performance measurements were not completely consistent. Since the relevant data only appeared in a journal blog, I didn't have enough information to resolve the discrepancy; which is ok. The first job of the performance analyst is to flag performance anomalies but most probably let others resolve them—after all, I didn't build the system or collect the measurements.

More importantly, that analysis was for a single server application (viz., time-to-first-fix latency). At the end of my post, I hinted at adding percentiles to PDQ for multi-server applications. Here, I present the corresponding rules-of-thumb for the more ubiquitous multi-server or multi-core case.

Single-server Percentiles

First, let's summarize the Guerrilla rules-of-thumb for single-server percentiles (M/M/1 in queueing parlance): \begin{align} R_{1,80} &\simeq \dfrac{5}{3} \, R_{1} \label{eqn:mm1r80}\\ R_{1,90} &\simeq \dfrac{7}{3} \, R_{1}\\ R_{1,95} &\simeq \dfrac{9}{3} \, R_{1} \label{eqn:mm1r95} \end{align} where $R_{1}$ is the statistical mean of the measured or calculated RT and $\simeq$ denotes approximately equal. A useful mnemonic device is to notice the numerical pattern for the fractions. All denominators are 3 and the numerators are successive odd numbers starting with 5.

Guerrilla Training Schedule for 2014

With the newfound popularity of smaller sessions that offer highly personalized tuition—along the lines of the Oxbridge system—all of the 2014 Guerrilla classes held in California will be limited to a maximum of 6 students. Overflow will go onto a waiting list for the equivalent class that will be held on a date to be determined. Waitees will be notified accordingly. So book early, book often!

The Importance of 2 + ε Dimensions: Flat is the Name of the Game

In a 2010 email, I wrote the following about Steve Jobs:

Sent: Mon, May 31, 2010 5:02:32 PM Subject: Why Jobs has been vindicated on quality Observation: Jobs has finally been vindicated on his stand over high quality (and premium price, although not as premium as it used to be). Why? Theory: Jobs has made the computer 2-dimensional. Data: From the earliest days of the Mac, Jobs preached quality (there's even a video clip with him slagging Gates for failing to understood quality). For 2 decades Jobs was proven wrong, in the sense that customers were not willing to pay a premium for quality, so Apple never garnered more than 4-5% of the PC market.

What happened at HealthCare.gov?

On Oct. 6th Federal officials admitted the online marketplace needed design changes, as well as more server capacity to improve efficiency on the federally run exchange that serves 36 states. More details in this WSJ article.

And finally, from the PR horse's mouth on Oct 20th:

"Initially, we implemented a virtual 'waiting room,' but many found this experience to be confusing. We continued to add more capacity in order to meet demand and execute software fixes to address the sign up and log in issues, stabilizing those parts of the service and allowing us to remove the virtual 'waiting room.' "

Quite apart from the bizarre architectural description, a "virtual waiting room" implies a buffer or buffers where pending requests must wait for service because the necessary resources to complete those requests are not available due to being either busy or failed. A certain amount of waiting time can be tolerated by users (both applicants and providers) but if it becomes too long or simply fails to complete, that kind of poor performance points to grossly under-scaled capacity in the original design.