The Pith of Performance: databases

Showing posts with label databases. Show all posts

Wednesday, December 25, 2013

Response Time Percentiles for Multi-server Applications

In a previous post, I applied my rules-of-thumb for response time (RT) percentiles (or more accurately, residence time in queueing theory parlance), viz., 80th percentile: $R_{80}$, 90th percentile: $R_{90}$ and 95th percentile: $R_{95}$ to a cellphone application and found that the performance measurements were not completely consistent. Since the relevant data only appeared in a journal blog, I didn't have enough information to resolve the discrepancy; which is ok. The first job of the performance analyst is to flag performance anomalies but most probably let others resolve them—after all, I didn't build the system or collect the measurements.

More importantly, that analysis was for a single server application (viz., time-to-first-fix latency). At the end of my post, I hinted at adding percentiles to PDQ for multi-server applications. Here, I present the corresponding rules-of-thumb for the more ubiquitous multi-server or multi-core case.

Single-server Percentiles

First, let's summarize the Guerrilla rules-of-thumb for single-server percentiles (M/M/1 in queueing parlance): \begin{align} R_{1,80} &\simeq \dfrac{5}{3} \, R_{1} \label{eqn:mm1r80}\\ R_{1,90} &\simeq \dfrac{7}{3} \, R_{1}\\ R_{1,95} &\simeq \dfrac{9}{3} \, R_{1} \label{eqn:mm1r95} \end{align} where $R_{1}$ is the statistical mean of the measured or calculated RT and $\simeq$ denotes approximately equal. A useful mnemonic device is to notice the numerical pattern for the fractions. All denominators are 3 and the numerators are successive odd numbers starting with 5.

Oracle Java 7 Security Vulnerability

National Cyber Awareness System

US-CERT Alert TA13-010A
Oracle Java 7 Security Manager Bypass Vulnerability

Original release date: January 10, 2013

Systems Affected

Any system using Oracle Java 7 (1.7, 1.7.0) including

Java Platform Standard Edition 7 (Java SE 7)
Java SE Development Kit (JDK 7)
Java SE Runtime Environment (JRE 7)

All versions of Java 7 through update 10 are affected. Web browsers using the Java 7 plug-in are at high risk.

PostgreSQL Scalability Analysis Deconstructed

In 2010, I presented my universal scalability law (USL) at the SURGE conference. I came away with the impression that nobody really understood what I was talking about (quantifying scalability) or, maybe DevOps types thought it was all too hard (math). Since then, however, I've come to find out that people like Baron Schwartz did get it and have since applied the USL to database scalability analysis. Apparently, things have continued to propagate to the point where others have heard about the USL from Baron and are now using it too.

Robert Haas is one of those people and he has applied the USL to Postgres scalability analysis. This is all good news. However, there are plenty of traps for new players and Robert has walked in several of them to the point where, by his own admission, he became confused about what conclusions could be drawn from his USL results. In fact, he analyzed three cases:

PostgreSQL 9.1
PostgreSQL 9.2 with fast locking
PostgreSQL 9.2 current release

I know nothing about Postgres but thankfully, Robert tabulated on his blog the performance data he used and that allows me to deconstruct what he did with the USL. Here, I am only going to review the first of these cases: PostgreSQL 9.1 scalability. I intend to return to the claimed superlinear effects in another blog post.

GCaP Class: Sawzall Optimum

In a side discussion during last week's class, now Guerrilla alumnus, Greg S. (who used to work at Google a few years ago) informed me that typical Sawszall preprocessing-setup times typically lie in the range from around 500 ms to about 10 seconds, depending on such factors as: cluster location, GFS chunkserver hit rate, borglet affinity hits, etc. This is the information that was missing in the original Google paper and prevented me from finding the optimal machine configuration in my previous post.

To see how these new numbers can be applied to estimating the corresponding optimal configuration of Sawzall machines, let's take the worst case estimate of 10 seconds for the preprocessing time. First, we convert 10 s to 10/60 = 0.1666667 min (original units) and plot that constant as the horizontal line (gray) in the lower part to the figure at left (click to enlarge). Next, we extend the PDQ elapsed-time model (blue curve) until it intersects the horizontal line. That point is the optimum, as I explained in class, and it occurs at p = 18,600 machines (vertical line).

That's more than thirty times the number of machines reported in the original Google paper—those data points appear on the left side of the plot. Because of the huge scale involved, it is difficult to see the actual intersection, so the figure on the right shows a zoomed-in view of the encircled area. Increasing the number of parallel machines beyond the vertical line means that the elapsed time curve (blue) goes into the region below the horizontal line. The horizontal line represents the fixed preprocessing time, so it becomes the system bottleneck as the degree of parallelism is increased. Since the elapsed times in that region would always be less than the bottleneck service time, they can never be realized. Therefore, adding more parallel machines will not improve response time performance.

Conversely, a shorter preprocessing time of 500 ms (i.e., a shorter bottleneck service time) should permit a higher degree of parallelism.

Friday, November 13, 2009

Scalability of Sawzall, MapReduce and Hadoop

This is a follow-up to a reader comment by Paul P. on my previous post about MapReduce and Hadoop. Specifically, Paul pointed me at the 2005 Google paper entitled "Parallel Analysis with Sawzall," which states:

"The set of aggregations is limited but the query phase can involve more general computations, which we express in a new interpreted, procedural programming language called Sawzall^†"
† Not related to the portable reciprocating power saw manufactured by the Milwaukee Electric Tool Corporation.

More important, for our purposes, is Section 12 Performance. It includes the following plot, which tells us something about Sawzall scalability; but not everything.

Figure 1.

Slacker DBs in the Cloud Base

In my view, another reason Larry Ellison diss'd Cloud Computing last year (even though he promoted "Thin Clients" a decade ago, but completely overlooked the necessary infrastructure to support it: aka the cloud), is that he's afraid of how it might negatively impact sales of the ORACLE RDBMS. Why? Most of the world's data is not in relational form, and never will be. More importantly, Google knows this. (Think MapReduce)

One of the first people to see this coming was relational database academic, Joe Hellerstein at UCB. In his 2001 talk entitled "We Lose" (PDF slides), slide 5 contains the gist of his prescient observations:

Grassroots use Filesystems, not DBs
Grassroots use App servers, not ORDBs
Grassroots write Java, PERL, Python, PHP, ... NOT SQL!

He defines Grassroots as: "Hackers. But also DBMS engineers, Berkeley grads, Physicists, etc."

Now, somewhere in between are Slacker Databases: "Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere, offering far greater simplicity than SQL, may have a better way of storing data for Web apps." Hellerstein was more right than he could've known.

Thursday, January 8, 2009

Review of R in NYT and GDAT

GDAT instructor, Jim Holtman, pointed me at this review of R in yesterday's New York Times. It definitely puts SAS on the defensive.

Update: Another piece in the tech section of NYT.

If you want to know how to apply R to performance data, sign up for the Guerrilla Data Analysis Techniques class scheduled for August 2009 and learn from Jim personally.

Friday, March 7, 2008

Hotsos 2008: Day 2

Tanel Poder continued his theme of better ways to collect Oracle performance data by demonstrating how his "Sesspack" (Oracle session level) data could be visualized using VBA calls to Excel charting functionality. He used Swingbench as a load generator for his demos. Afterwards, I spoke with him about my talk tomorrow and he said he was interested and would attend.

Hotsos 2008: Day 1

Like last year, the Hotos 2008 Oracle performance symposium continues to be a class act.

ORACLE Scalability Oracles

For those of you concerned with ORACLE 10g performance, there are a couple of books by ORACLE oracles that you might find useful; especially with regard to ORACLE scalability

Helping Amazon's Mechanical Turk Search for Jim Gray's Yacht

Regrettably, Jim Gray is still missing, but I thought Amazon.com deserved more kudos than they got in the press for their extraordinary effort to help in the search for Gray's yacht. Google got a lot of press coverage for talking up the idea of using satellite image sources, but Amazon did it. Why is that? One reason is that Amazon has a lot of experience operating and maintaining very large distributed databases (VLDBs). Another reason is that it's not just Google that has been developing interesting Internet tools. Amazon (somewhat quitely, by comparison) has also developed their own Internet tools, like the Mechanical Turk. These two strengths combined at Amazon and enabled them to load a huge number of satellite images of the Pacific into the Turk database, thereby facilitating anyone (many eyes) to scan them via the Turk interface, and all that on very short order. Jim would be impressed.

I spent several hours on Sunday, Feb 4th, using Amazon's Mechanical Turk to help look for Gray's yacht. The images above (here about one quarter the size displayed by the Turk) show one example where I thought there might have been an interesting object; possibly a yacht. Image A is captured by the satellite at a short time before image B (t₁ < t₂). You can think of the satellite as sweeping down this page. Things like whitecaps on the ocean surface are going to tend dissipate and thus change pixels between successive frames, whereas a solid object like a ship will tend to remain invariant. The red circle (which I added) marks such a stable set of pixels which also have approximately the correct dimensions for a yacht i.e., about 10 pixels long (as explained by Amazon). Unfortunately, what appears to be an interesting object here has not led to the discovery of Gray's whereabouts.

Use of the Turk satellite images was hampered by a lack of any way to reference the images (about 5 per web page) by number, and there was no coordinate system within each image to express the location of any interesting objects. These limitations could have led to ambiguities in the follow up human expert search of flagged images. However, given that time was of the essence for any possible rescue effort, omitting these niceties was a completely understandable decision.

The Pith of Performance