The Pith of Performance: May 2008

Saturday, May 24, 2008

Instrumentierung – die Chance für Linux?

My latest article for the German publication Linux Technical Review appears in Volume 8 on "Performance und Tuning" and discusses a possible roadmap for future Linux instrumentation and performance management capabilities. The Abstract reads:

German: Linux könnte seine Position im Servermarkt ausbauen, wenn es dem Vorbild der Mainframes folgte und deren raffiniertes Performance-Management übernähme.
English: Linux could be in a position to expand its presence in the server market by looking to mainframe computer performance management as a role model and adapting its instrumentation accordingly.

Topics discussed include a comparison of time-share scheduling (TSS) with fair-share scheduling (FSS) and the Linux Completely fair scheduler (CFS), how to achieve a more uniform interface to performance and capacity planning measurements, and the kind of advanced system management capabilities available on IBM System Z for both their mainframes and clusters.

Thursday, May 15, 2008

Is BitTorrent Being Blocked on Your Block?

Ever since Comcast was sprung by the AP, last fall, for surreptitiously blocking or retarding BT traffic with forged packets, other ISPs, e.g., Road Runner, Charter, Bell Canada and Cox TOS, have also started to block P2P traffic, despite public criticism of their rationale. There are a number of anti-blocking tools (least gyrating web-ads page) and services available to help you determine if your packets and ISP service is being tweaked. Two FOSS tools that are readily available are: Glasnost from Germany, and Pcapdiff from EFF.

Wednesday, May 14, 2008

Spam Still Pays

This interesting nugget appeared in a recent judgement against notorious spammerati Sanford Wallace and Walter Rines. According to court records, "Wallace and Rines sent 735,925 messages and earned over $500,000 in the process." That's better than a 67% hit rate! Of course, those ill-gotten gains pale when compared to the award of $230 mil granted to MySpace.com, who filed the lawsuit last year. But even getting a piece of the $500 K might be a bit tricky since the pair failed to appear in court and have since gone missing. Wallace already has an outstanding $4 mil (peanuts?) fine from the FTC. All in all, looks like crime still pays.

Monday, May 12, 2008

Preposterously Parallel iPods

Here's a question: How many FLOPs in an iPod?

Climatologists, up the road here at LBL, claim that a supercomputer using about 20 million embedded microprocessors, such as those found in cellphones, iPods, and other consumer electronic devices, would deliver useful climate simulation results at a cost of $75 million to construct. A comparable supercomputer could cost closer to $1 billion. Based on a recent post, I'd be wanting to see the front-end compiler system that can upload 20 million processors.

Wednesday, May 7, 2008

Visual Tornadoes and Cyclones

Although physical tornadoes and cyclones are in the news at the moment, there are also the virtual kind or more significantly for PerfViz, the visual kind.

For a long time, I've thought it would be cool to be able to visualize system performance as a shape but was never quite sure what that meant. My role model has been SciViz, where complicated system dynamics like the time-development of tornadoes can be visualized in 3D animations. More recently, the cyclone paradigm has been used for textual analysis based on word repetition (The novel "Frankenstein" is show above). The more a word is used, the larger is its cube. Blue cubes are words that are unique, red cubes are not. The diameter of the rings is determined by the size of the paragraphs. Who woulda thunk it?

The closest I've come to producing performance data as a "shape" is this:

which shows processor %user, %system, and %idle time for a 72-way SMP running a transaction workload on ORACLE 10g over a 20 minute measurement period. Data supplied by Tim Cook of Sun Microsystems. The time-development of the data (not shown here) is not too far removed from the tornado animation in the first figure.

Monday, May 5, 2008

Microsoft Discovers the Dumpster Datacenter

OK, not exactly a dumpster but something slightly bigger; a shipping container. Hello!? Google has been developing this concept for years with Sun and IBM not far behind in adopting it. The new wrinkle is that Google has now been awarded a patent on it.

Supply Chain Factoid: There are so many more (full) shipping containers coming from Asia to the USA and Europe than going the other way, that it is less cost-effective to store the empties than to simply scrap them and make new containers as needed.

Saturday, May 3, 2008

Object, Time Thyself

For quite a while (6 years, to be exact), I've thought that the only sane way to address the problem of response-time decomposition across multi-tier distributed applications, is for the software components to be self-timing. In fact, I just found the following old email which shows that I first proposed this concept (publicly) at a CMG vendor session conducted by Rational Software (now part of IBM) in 2002:

First Guerrilla Boot Camp Rookies Survive

The first "Guerrilla" class for the year and the first Level I Boot Camp ever, seems to have gone swimmingly. Here is some example feedback:
Tim McCluskey wrote on 4/30/08 12:29 PM: The class was great! I'm glad that I didn't listen to the 2 people that gave me the impression that it or you were going to be over my head. Vladimir Begun wrote on 4/30/08 20:33:36 -0700: ... it was great! A clear and horizon-expanding presentation of an actual experience in the capacity planning. About right for the jump-start! Eager to attend the level II class.
More people enrolled than we had hoped for in the shortened time-frame that was available to advertise it, and they all reported liking the hotel, sleeping rooms, seminar room, food, and especially the free wifi everywhere. They were also grateful for being able to use the ensuite bathroom across the hall instead of having to walk to the other end of the hotel corridor. :-)

We'll do it all again, in a couple of weeks but at Level II, this time.

Pay per VPU Application Development

So-called "Cloud Computing" (aka Cluster Computing, aka Grids, aka Utility Computing, etc.) is even making the morning news these days, where it is being presented as having a supercomputer with virtual processor units just a click away on your home PC. I don't know too many home PC users who need a supercomputer, but even if they did and it was readily available, how competitive would it be given the plummeting cost of multicores for PCs?

Ruby on de-Rails?

Here we go ... According to a recent post on TechCrunch, Twitter.com is planning to abandon Ruby on Rails after two years of fighting scalability issues. The candidates for replacing RoR are rumored to be PHP, Java, and/or just plain Ruby. On the other hand, Twitter is categorically denying the rumor saying that they use other development tools besides RoR. This is typical of the kind of argument one can get into over scalability issues when scalability is not analyzed in a QUANTIFIED way.

As I discuss in Chap. 9 of my Perl::PDQ book, the qualitative design rules for incrementally increasing the scalability of distributed apps go something like this:

Move code, e,g., business logic, from the App Servers to the RDBMS backend and repartition the DB tables.
Use load balancers between every tier. This step can accommodate multi-millions of pageviews per day.
Partition users into groups of 100,000 across replicated clusters with partitioned DB tables.

All the big web sites (e.g., eBay.com and EA.com) do this kind of thing. But these rules-of-thumb beg the question, How can I quantify the expected performance improvement for each of these steps? Here, I only hear silence. But there is an answer: the Universal Scalability Law. However, it needs to be generalized to accommodate the concept of homogeneous clustering, and I do just that in Section 4.6 of my GCaP book.

The following slides (from 2001) give the basic idea from the standpoint of hardware scalability.

Think of each box as an SMP containing p-processors or a CMP with p-cores. These processors are connected by a local bus, e.g., a shared-memory bus; the intra-node bus. Therefore, we can apply the Universal Scalability model as usual, keeping in mind that the 2 model parameters refer to local effects only. The data for determining those parameters via regression could come from workload simulation tools like LoadRunner. To quantify what happens in a cluster with k-nodes, an equivalent set of measurements have to be made using the global interconnect between cluster nodes; the inter-node bus. Applying the same statistical regression technique to those data gives a corresponding pair of parameters for global scalability.

The generalized scalability law for clusters is shown in the 5th slide. If (in some perfect world) all the overheads were exactly zero, then the clusters would scale linearly (slide 6). However, things get more interesting in the real world because the scaling curves can cross each other in unintuitive ways. For example, slide 7 "CASE 4" shows the case where the level of contention is less in the global bus than it is the local bus, but the (in)coherency is greater in the global bus than the local bus. This is the kind of effect one might see with poor DB table partitioning causing more data than anticipated to be shared across the global bus. And it's precisely because it is so unintuitive that we need to do the quantitative analysis.

I intend to modify these slides to show how things scale with a variable number of users (i.e., software scalability) on a fixed hardware configuration per cluster node and present it in the upcoming GCaP class. If you have performance data for apps running on clusters of this type, I would be interested in throwing my scalability model at it.