The Pith of Performance: 2007

Thursday, December 20, 2007

Running PDQ Under SELinux

PDQ user Rodrigo Campos (Brazil) reports a gotcha when trying to use Perl::PDQ, in his case, under SELinux (security enabled Linux, not Swedish Linux). The solution simply requires changing the security context of the pdq.so shared object. His detailed instructions for doing that have been posted to the PDQ download page. Thank you, Rodrigo!

Wednesday, December 19, 2007

CMG 2008: Call for PerfViz Papers

It's official! Performance visualization is a "focus area" within the Hot Topics Session Area track for CMG 2008 in Las Vegas, Nevada. The official CFP is now posted and Jim Holtman and I are the Session Area Chairs (SACs) for Hot Topics. In an attempt to build of the recent success of the Barry007 presentations at CMG 2007, we would like to see many more diverse contributions on PerfViz: Better computer performance and planning through better visualization tools, in 2008.

CMG 2007: After the Storm

CMG was a bit of a blur for me because I ended up doing 10 hours of presentations, including an interview for a future podcast. Our performance visualization ("PerfViz") session was very well attended and the demos went better than expected due to Mario finally getting his Java act together about 2 hours before we went live. Nothing like JIT capacity planning!

We also had the biggest BOF session attendence of CMG 2007. Stay tuned for more details about the significance of this turnout for CMG 2008, including a new online "PerfViz" forum.

As promised to the attendees of my various CMG sessions, you will find all the supporting materials posted on my web site at the CMG Materials page.

Friday, November 30, 2007

My Updated CMG 2007 Schedule

Here is an updated list of my

sessions.

Sunday Workshop

"How to Move Beyond Monitoring, Pretty Damn Quick!"
Session 191 Sunday 1:00 PM - 4:30 PM Room: Elizabeth D/E
CMG-T

"Capacity Planning Boot Camp"
Three Sessions 431 Wednesday 1:15 PM - 2:15 PM Wednesday 2:45 PM - 3:45 PM Wednesday 4:00 PM - 5:00 PM Room: Elizabeth D/E
Introductory CaP class for newbies.
Apdex Alliance Meeting

"Triangulating the Apdex Index with Barry-3"
Sessions 45A Wednesday 4:00 PM - 5:00 PM
This talk is part of the Apdex mini-conference at CMG and will be presented by Mario Jauvin, since it overlaps with my CMG-T classes.
Hot Topics Paper 7050

"Seeing It All at Once with Barry"
Session 511 (Advanced) Hot Topics Monday 4 pm - 5 pm Room: Elizabeth D/E
Dr. Neil J. Gunther, Performance Dynamics
Mario Jauvin, MFJ Associates

Complete abstracts were blogged previously.

Wednesday, November 28, 2007

Apdex Meets Apex

The Apdex Alliance has defined a performance metric, called the Apdex index, which rates the measured response times of distributed applications from an Internet user perspective. The Apdex index is constructed from three categories which are defined by partitioning the total number of sample counts (C) according to an agreed upon threshold time (τ):

Satisfied (0 < S < τ)
Tolerating (τ < T < 4τ)
Frustrated (F > 4τ)

Perl::PDQ Corrigenda Updated

It's been a while, but reader Peter Altevogt (Germany) spotted some numerical inconsistencies in Table D.2 on p. 403 of my Perl:PDQ book. This turns out to be the result of an Excel gotcha when copying cells and pasting them to a different spreadsheet location; the cell reference gets silently incremented. Sigh! The correct values are now available on the corrigenda page. Thank you, Peter!

Monday, October 29, 2007

Folsom: Not Just a Prison But A Cache

A nice update to my previous posts about the Intel Penryn microprocessor:

appears on a Dutch blog (in English---damn good English, BTW). The blogger was apparently invited to Intel's geographical home for the development of Penryn; not HQ in Santa Clara, California but Folsom (pronounced: 'full sum'), California. Consistent with Intel's January 2007 announcement, he notes that November looks to be show time for their 45 nm technology.

Since the author was a visitor, he failed to appreciate certain local ironies in his report. He missed was the fact that Penryn is a small town due north of Folsom, just off Interstate 80 on the way to Lake Tahoe. He refers to the huge Intel campus at the edge of the town. At the other end of town is an even better known campus; one of the state's major prisons immortalized in this Johnny Cash (not Cache) song. So, not only are criminals cached there but so also are some of Intel's best microprocessor designers (not as an intended punishment for the latter, presumably). OK, I'll stop there because I'm having way too much fun with this. Read the blog.

Sunday, October 28, 2007

Erlang's Collected Papers

In 1948, the collected papers of Agner Erlang (AKA the father of queueing theory) were translated from the original Danish and published in the Transactions of the Danish Academy of Technical Sciences. They were reissued as a book by Acta Polytechnica Scandinavica in 1960, but due its underwhelming popularity, that book is now out of print. However, I just discovered that the chapters of the book are now available on the web. Kudos to the Academy!

Streeeeeeetch!

The October 2007 Linux Magazine (no. 10, issue 83, p. 62) is carrying the English version of my original German article about converting load averages to stretch factors. Unfortunately, there is no direct URL (Sun Oct 28, 2007: As Metapost commented below, it is now available for viewing) but the cute visual hook has a picture of a stretch limo ... stretched across two pages.

I wish I'd thought of that.

Friday, September 28, 2007

SOA Scalability and Steady-State

Guerrilla alumnus Peter Lauterbach just brought to my attention an article in SOA World entitled "Load Testing Web Services". I have to commend these authors for performing their SOA load tests in steady state. Elsewhere, I've discussed how wrong things can go when you don't adhere to this procedure. In their online article, these authors show the response time (R) as a time-series plot, more or less as it would appear in a measurement tool like say, LoadRunner. Although they don't show it, the throughput measurements would also look similar when plotted as a function of time (t).

Best Practices Are An Admission of Failure

Six Sigma: Quite a list.

ITIL: Best Practice is defined as "good working practice developed through consensus that helps organizations to achieve better performance.”

Sounds good, but ...

Ludwig Wittgenstein: "Just because we all agree on something, doesn't make it true."

Therefore ...

Guerrilla Manual 1.21: Best Practice is tantamount to not trying to understand the problem. Merely copying someone else's apparent success is like cheating on a test. You might make the grade, but how far is the bluff going to take you?

So ...

Thomas Edison: "There's a better way. Find it!"

Sunday, September 23, 2007

Black Swans, Instantons, Hedge Funds and Network Collapse

On my flight to Europe last July, I read The Black Swan: The Impact of the Highly Improbable by N. Taleb. Unfortunately, I found the book irksome for several reasons:

I already knew the mathematical underpinnings of the metaphors used in the book (more on that below).
Taleb's writing style is unnecessarily condescending toward others mentioned in the book and to the reader.
Some rather obvious points are labored. The weirdest of these comes in the form of a entirely fictitious character to which an entire chapter is devoted.
Many of his often poor and sometimes inaccurate examples kept reminding me of something a Stanford mathematician once told me: "Economists are mathematically unsophisticated."
He describes a general problem or syndrome related to how people assess risk incorrectly, but he doesn't really offer any solutions (or maybe I missed it in the chapter entitled, "How to Look for Bird Poop" ... seriously).

I must say this book was a disappointment because it was a stark contrast to seeing him interviewed months earlier on PBS, where he came across as more thoughtful and measured. My opinion notwithstanding, you might find the book worth reading because it's an easy read, it covers many topics (mostly with a financial slant—the author's background), and he's also warning the reader about the dangers of things like high-risk hedge funds. Moreover, as I shall try to demonstrate here, these same concepts also impinge on performance analysis (not that Taleb is aware of that) and whereas they might otherwise be impenetrable to the non-mathematician, possibly they are made a little more accessible in a book like this. In a nutshell, I believe he is saying: Think wild, not mild; easy to say, hard to do, as I shall try to explain.

Virtualization Rootkit Wars

VMM malware is another side-effect of creating illusions (See my previous blog entry on the danger of illusions). It turns out that still waters run very deep. Here's a potted summary of some recent events in the world of stealth that have impinged on both VMM security issues and performance analysis. (The following contains a lot of acronyms, for which I've provided a glossary at the end).

Last year at BlackHat, some Polish security experts announced a proof-of-concept for a VME rootkit called "Blue Pill " (BP) that they claimed was undetectable. For BlackHat 2007, some U.S. security experts challenged the Polish team to a Detect-A-Thon (my term). This caused the Polish team to go into defensive posture and make a list of run-rules (my term) for how the Detect-A-Thon was to be carried out. Since BP is only a virtual rootkit (if I can use that term), one of the proposed run-rules was payment (up front?) of almost $500,000 for development costs to make a real implementation of BP battle ready. Nice work if you can get it.

Quite apart from all these claim-counter-claim machinations, what got my attention was one of the ways by which the U.S. team claimed that BP would be detectable (there are plausibly many) viz., counting execution cycles. The CPUID instruction, in particular, is supposed to only take 200 cycles (as root), not 5000 cycles (non-root). I saw a certain irony in the fact that, although I've been complaining about VMM illusions masking correct performance analysis, performance analysis is one method for detecting HVM malware. The procedure is analogous to the analysis in Section 3.2.2. of my CMG 2006 paper "The Virtualization Spectrum from Hyperthreads to GRIDs" where I showed that the increase in thread execution time is due mostly to an inflation of the thread service time on a dual-core. There, I had to infer the effect from system-level measurements whereas here, they are talking about reading the actual cycle counter/register directly. It turns out that this technique is not totally foolproof either, because the timings can be masked with the appropriate trap. Looking for changes in the TLB is another method that has been proposed. Naturally, in this kind of game, the beat goes on and although rootkit detectors are already available, there will be many more as VMM stealth techniques evolve.

Glossary

BP: "Blue Pill". An HVM rootkit.
CPUID: x86 instruction to identify the CPU type.
Guest: VMWare lingo for a native O/S that runs on a VMM.
HVM: Hardware-Assisted Virtual Machine.
Hyperjacker: Hypervisor hijacking.
Hypervisor: See VMM.
Malware: Malicious software. A stealthy rootkit in this context.
Rootkit: A set/kit of tools/executibles with root access (highest privilege).
TLB: Translation Look-aside Buffer.
VME: Virtual Machine Emulators e.g, "Blue Pill", "Vitriol".
VMM: Virtual Machine Monitor e.g., VMWare, Xen.

Monday, September 17, 2007

MVA, Upgrades, and Other Visitations Upon PDQ

Guerrilla alumnus Sudarsan Kannan asks:

"I'm trying to understand concepts behind performance modeling (analytical modeling) based on MVA algorithm. ...I'm also trying to understand WHAT IF scenarios such as: What if I increase my CPU speed or upgrade Disk I/O subsystem? What impact will the increase in CPU speed have on throughput, response time and more variables? What if I increase number of users. I have couple of questions to get a better picture of MVA algorithm:

How to find visit ratios for CPU?

Can I vary service time (S) for a resource (CPU or disk) if I increase/decrease the processor/disk speed to answer WHAT IF scenarios?"

Why Doesn't PDQ Have a GUI?

Recently I was asked if I planned to create a GUI (Graphical User Interface) for PDQ. I've thought a lot about this over the years and my answer (still) is negatory and here's why.

Darwin’s Dictum and Performance Monitoring

Darwin’s Dictum:

"All observation must be for or against some view if it is to be of any service." (Source: Scientific American magazine, April 2001)

Translation:

All performance monitoring must either agree or disagree with some performance model* if it is to be of any use.

* A performance model could be any or all of a SWAG, a back-of-the-envelope calculation, an Excel spreadsheet, a Mathematica notebook, an S-plus script, a PDQ model, a SAS model, a SimPy simulation, a LoadRunner script, etc.

I'll have a lot more to say about this during my CMG Sunday Workshop entitled Moving Beyond Monitoring, Pretty Damn Quick! (December 2, 2007).

To BEA or Not to BEA?

For all you WebLogic and Tuxedo lovers, here's another fine example of why I keep saying, we performance weenies can't afford to operate in a computing cloister. We have to keep a weather eye on the machinations of the marketplace.

Billionaire activist investor Carl Icahn (Wasn't he in the movie "Corporate Raiders of the Last Deutschmark"?) has called for the sale of BEA Systems Inc., whose stock price has sagged with the growth in open-source software and under pressure from larger competitors such as IBM Corp. and Oracle Corp. Analysts said BEA has failed to stir enthusiasm among investors. For example, Rob Enderle, principal analyst with Enderle Group, a technology consulting firm in San Jose, Calif. stated, "This class of software, because of open source, it's much harder to get people interested in it unless you're doing phenomenally well in sales...which BEA has not been."

Also on Friday, BEA said it has received an additional notice from the Nasdaq that it remains out of compliance because of the delayed filings and its shares remain in danger of being delisted.

Thursday, August 30, 2007

My CMG 2007 Presentation Schedule

This year, all CMG 2007 sessions will be held in the Manchester Grand Hyatt San Diego starting Sunday, December 2 and going through Friday, December 7. Currently, my sessions are scheduled as follows:

PDQ Gets Tickled

I recently stumbled across this reference to PDQ in Tcl. The author (Todd Coram) correctly notes that we use SWIG to generate the Perl and Python wrappers and this also facilitates Tcl, apparently. I don't know if he has completed the Tcl port or plans to offer it to us for release to the world at large. Maybe he'll let me know. Surely, The Father of Tcl (a big fan of scripting tools) would approve.

Postscript: Todd Coram responded via email and stated that his attempt had gone into limbo some time ago. So, it looks like open season for anyone interested in doing a Tcl port of PDQ (or any other language for that matter).

Solaris to Shine on the Mainframe (Say what!?)

Quite apart from the surprise over what passes for physics these days, PhysOrg.com recently reported on a surprise deal that will enable Sun's Solaris operating system to run on IBM servers.

Initially, the agreement will involve only IBM's (AIX) mid-range servers, which can also run the Windows and Linux operating systems, but eventually, so the report says, IBM hopes to bring Solaris to the mainframe. I assume this means it will run in a z/OS LPAR, like they do with Linux. If I take the view (and I do) that the mainframe is not a "dinosaur" but just another (excellent data processing) server on the network, one wonders where this leaves future Sun hardware platforms.

Add to this the growing emphasis by Sun to deploy Intel and AMD microprocessors for cost reasons and, as Jonathan Schwartz says, it "represents a tectonic shift in the market landscape." No kidding! I just wonder whether Schwartz will be riding the plate that stays on top or the plate that goes under.

Clearing Up Visual Chaos in Performance Tools

Guerrilla alum Paul Puglia pointed me at some work done by researchers at MIT who have developed two measures of visual clutter: the Feature Congestion measure, and the Subband Entropy measure. This is the sort of new paradigm that could be very useful in the context of performance visualization.

Linux Weather Forecast (Details at 11?)

Apropos my previous criticism about the lack of public design documentation for the Linux kernel, this Linux Weather Forecast page looks like a move in the right direction.

Section 2.1 even has some words about the CFS scheduler. I would still like to see a more detailed comparison of CFS with the well-known TSS (time share) scheduler and lesser-known FSS (fair share) scheduler.

Wednesday, July 11, 2007

Leistungsdiagnostik - Load Averages and Stretch Factors

My latest article for the German Linux-Magazin has just appeared in the August edition under the title "Leistungsdiagnostik". The abstract reads:

Shellkommandos wie »uptime« werfen stets drei Zahlen als Load Average aus. Allerdings wissen nur wenige, wie sie zustande kommen und was genau sie bedeuten. Dieser Beitrag klärt darüber auf und stellt zugleich mit dem Stretchfaktor eine Erweiterung vor.

The main theme is about how to extend absolute load averages to relative stretch factor values.

Waiting Snakes

I'm in Europe this month and a colleague here pointed me at this site
which contains quite a good summary of books, papers, and tools on queueing networks. Tbe German word "Warteschlangen" translates literally as "waiting snake."

Thursday, June 28, 2007

Unnatural Numbers

What is the next number on the sequence: 180, 90, 45, ... ? Since each number appears to be obtained by halving the previous number in the sequence, it should be 22.5. This shows that even if you start randomly with a large natural number (which is not a power of 2) and you keep halving the successive values, you will reach an odd natural number and when you halve that, you end up with a fraction (a real number). Here's a little Python code that demonstrates the point:

#!/usr/bin/python
import random

r = random.randrange(100,1000,2)
print r

while True:
r /= 2
print r 
if r % 2 != 0: break

Run it several times and you will see that the sequence, although generally not long, contains an unpredictable number of iterates. This is easily understood by noting that division by 2 is equivalent to a bitwise right-shift (i.e., >> operator in C, Python and Perl), so each successive division simply marches the bit-pattern to the right until a '1' appears in the unit column. That defines an odd number and another right-shift produces '1' as a remainder, which I can either ignore as with integer division or carry as a fraction. It also means that you can predict when you will hit an odd number by first representing the starting number as binary digits, then counting the number of zeros from the rightmost position to the first occurrence of a 1-digit.

In the manufacture of microprocessors and memory chips, the increase in speed follows Moore's law, which is also a halving sequence like the above. Therefore, we see natural numbers like 90 nanometer (nm), 45 nm, and so on. For reference, the diameter of the HIV virus (a molecular machine) is about 100 nm. As I've discussed elsewhere, 45 nm is the next-generation technology that IBM, Intel and AMD will be using to fabricate their microprocessors. But the current smallest technology being used by AMD and Intel is not 90 nm, it's 65 nm. And the next generation after 45 nm will be 32 nm, not 20-something. So, where do these 'unnatural' numbers come from?

Each progression in shrinking the silicon-based photolithographic process is identified by a waypoint or node on a roadmap defined by the Semiconductor Industry Association or SIA. The current SIA roadmap looks like this:

Year  2004    2007    2010    2013    2016    2020 
nm     90      65      45      32      22      14

A technology node is defined primarily by the minimum metal pitch used on any product. Here, pitch refers to the spacing between the metal "wires" or interconnects. In a microprocessor, for example, the minimum metal pitch is the half-pitch of the first layer of metal used to connect the actual terminals of one transistor to another. Notice that IBM and Intel technology is moving faster than SIA expectations, so the roadmap already needs to be updated again.

Wednesday, June 20, 2007

Housekeeping My List of Pubs

I've been out of the office more than in, this week, so I have a lot of catching up to do. The only thing I've managed to achieve here is to update my list of publications.

Friday, June 15, 2007

Linux Instrumentation: Is Linus Part of the Problem?

I was interested to read in a LinuxWorld article entitled "File System, Power and Instrumentation: Can Linux Close Its Technical Gaps?", that Linus Torvalds believes the current kernel instrumentation sufficiently addresses real-world performance problems.

This statement would be laughable, if he weren't serious. Consider the following:

How can current Linux instrumentation be sufficient when older UNIX performance instrumentation is still not sufficient?
UNIX instrumentation was not introduced to solve real-world performance problems. It was a hack by and for kernel developers to monitor the performance impact of code changes in a light-weight O/S. We're still living with that legacy. It might've been necessary, but that doesn't make it sufficient.
The level of instrumentation in Linux (and UNIX-es) is not greatly different from what it was 30 years ago. As I discuss in Chap. 4 of my Perl::PDQ book, the idea of instrumenting an O/S goes back (at least) to c.1965 at MIT.
Last time I checked, this was the 21st century. By now, I would have expected (foolishly, it seems) to have at my fingertips, a common set of useful performance metrics, together with a common means for accessing them across all variants of UNIX and Linux.
Several attempts have been made to standardize UNIX performance instrumentation. One was called the Universal Measurement Architecture (UMA), and another was presented at CMG in 1999.
The UMA spec arrived DOA because the UNIX vendors, although they helped to design it, didn't see any ROI where there was no apparent demand from users/analysts. Analysts, on the other hand, didn't demand what they had not conceived was missing. Such a Mexican standoff was cheaper for the UNIX vendors. (How conveeeeenient!) This remains a potential opportunity for Linux, in my view.
Rich Pettit wrote a very thoughtful paper entitled "Formalizing Performance Metrics in Linux", which was resoundingly ignored by Linux developers, as far as I know.

Elsewhere, I've read that Linus would like to keep Linux "lean and mean". This reflects a naive PC mentality. When it comes to mid-range SMP servers, it is not possible to symmetrize the kernel without making the code paths longer. Longer code paths are necessary for control and scalability. That's a good thing. And improved performance instrumentation is needed a fortiori for the layered software architectures that support virtual machines. Since Linus is the primary gate-keeper of Linux, I can't help wondering if he is part of the solution or part of the problem.

Wednesday, June 13, 2007

Linux CFS: Completely Fair or Completely Fogged?

A while ago, I saw what looked like an interesting blog entry announcing a new task scheduler for Linux called CFS: Completely Fair Scheduler. I wanted to compare CFS with Fair Share scheduling (FSS); something that took off in the 1990's for UNIX operating systems and something I've looked into from a performance perspective, especially because FSS provides the fundamenal resource allocation mechanism for most VMM hypervisors.

My Message to Virtualization Vendors

Virtualization is about creating illusions (see Chapter 7 in Guerrilla Capacity Planning). However, vendors need to recognize that virtualization is a double-edged sword.

Constructing illusions by hiding physical information from users is one thing, propagating that illusion to the performance analyst or capacity planner is quite another, and considered harmful.

Presumably, it's also potentially bad for business, in the long run. This unfortunate situation has arisen for one of the following reasons:

The performance data that is available is incorrect.

Example: Enabling hyperthreading on a Xeon processor misleads the operating system, and thereby performance tools, into treating the single core as 2 virtual processors. This means that many performance management tools will conclude and report that your system has 200% processor capacity available. But, this is an illusion, so you will never see 200% processor capacity.
The correct performance data is not made available.

Example: With hyperthreading enabled, there should be a separate register or port that allows performance management tools to sample the actual utilization of the single physical core (AKA the execution unit). The IBM Power-5 has something called the PURR register that performs this role.

There are many examples of this kind of mangled performance shenanigans in the virtualized world, especially in what I call the meso-VM (PDF) level such as VMware and XenSource. The good news there is, since it's software, it's easier to modify and therefore more likely that actual performance data will become exposed to the analyst.

In other words:

Fewer bells, more whistles

should be the watchword for virtualization vendors.

Wednesday, May 23, 2007

Greek for Geeks

When we come to discuss queueing models in my training classes, I emphasize the fact that my approach to the subject comes from wanting to avoid the blizzard of Greek (mathematical notation) that usually makes the whole subject so obscure to the very people who could use it most; performance analysts.

More on Moore

In an opinion piece in this month's CMG MeasureIT e-zine, I examine what is possibly going on behind the joint IBM-Intel announcement and the imminent release of 45 nm 'penryn' parts in CMOS.

Some related blog entries are:

Tuesday, May 22, 2007

How to Extend Load Tests with PDQ

Suppose you want to assess the scalability of a web application based on measurements from a test rig using a load-test tool like LR, WAS or Grinder. There's one slight problem. The load-test tools are limited to running a finite number (N) of client-side load generators, and that number will always be much much smaller than the number of actual web users submitting transactions when the application goes live. How can you bridge that gap?

ORACLE Scalability Oracles

For those of you concerned with ORACLE 10g performance, there are a couple of books by ORACLE oracles that you might find useful; especially with regard to ORACLE scalability

Programming Multicores Ain't Easy

MIT researchers are exploring a way to make parallel programming easier in order to take full advantage of the computing potential available in multicore-based computers. Many experts believe that unless parallel programming is made easier, computing progress will stall. I discussed this point in my CMG 2006 paper (download PDF).

In single core systems, software code basically runs sequentially, with each task occurring one after another, but in multicore systems tasks get split up among the cores and when different tasks need to access the same piece of memory and fail to properly synchronize the data can become corrupted and cause the program to crash. MIT has designed StreamIt, a computer language and a compiler that basically hides parallel-programming challenges but also allows for full use of multicore processors.

Friday, April 20, 2007

PyDQ (PDQ in Python) Web Pages Get Updated

For all you Python fans, I've updated the PyDQ web page with more online examples to get you started. For example, the communications network model has been revised to use NumPy inline to solve the traffic equations for the internal flows. Previously, I had solved those simultaneous equations separately using Mathematica.

The online PDQ Manual is also being revised to include PyDQ functions and code examples.

Wednesday, April 18, 2007

How Long Should My Queue Be?

A simple question; there should be a simple answer, right? Guerrilla alumus Sudarsan Kannan asked me if a rule-of-thumb could be constructed for quantitatively assessing the load average on both dual-core and multicore platforms. He had seen various remarks, from time to time, alluding to optimal load averages.

More On Penryn

In a previous blog entry, I noted that Intel was planning to release "penryn" in the final quarter of this year (2007). During a conference call Monday morning, Intel executives provided an overview of the more than twenty new products and initiatives being announced later today at the Intel Developer Forum in Beijing, including new performance specs for the company's next generation Penryn processor family.

Intel said that early Penryn performance tests show a 15 percent increase in imaging related applications, a 25 percent performance increase for 3D rendering, and more than 40 percent performance increase for gaming. The tests, according to Maloney, were based on pre-production 45nm Intel quad core processors running at 3.33 GHz with a 1333 front side bus and 12 MB cache versus a 2.93 GHz Intel Core 2 Extreme (QX6800) processor, just announced last week . Intel said that for high-performance computing, users can expect gains of up to 45 percent for bandwidth intensive applications, and a 25 percent increase for servers using Java. Those tests were based on 45nm Xeon processors with 1,600-MHz front side buses for workstations and HPCs, and a 1,333 MHz front side bus for servers - versus current quad-core X5355 processors, the company said.

During the call, Intel execs also took the opportunity to reveal a few more details on Project Larrabee, a new "highly parallel, IA-based programmable" architecture that the company says it is now designing products around. While details were scant, Maloney did say that the architecture is designed to scale to trillions of floating point operations per second (teraflops) of performance and will include enhancements to accelerate applications such as scientific computing, recognition mining, synthesis, visualization, financial analytics, and health applications.

Monday, April 16, 2007

Forget Multicores, Think Speckles

Prototypes of the so-called "speckled computers" will be presented at the forthcoming Edinburgh International Science Festival.

"Speckled Computing offers a radically new concept in information technology that has the potential to revolutionise the way we communicate and exchange information. Specks will be around 1 mm3 semiconductor grains; that's about the size of a matchhead, that can sense and compute locally and communicate wirelessly. Each speck will be autonomous, with its own captive, renewable energy source. Thousands of specks, scattered or sprayed on the person or surfaces, will collaborate as programmable computational networks called Specknets.

Computing with Specknets will enable linkages between the material and digital worlds with a finer degree of spatial and temporal resolution than hitherto possible; this will be both fundamental and enabling to the goal of truly ubiquitous computing.

Speckled Computing is the culmination of a greater trend. As the once-separate worlds of computing and wireless communications collide, a new class of information appliances will emerge. Where once they stood proud – the PDA bulging in the pocket, or the mobile phone nestling in one’s palm, the post-modern equivalent might not be explicit after all. Rather, data sensing and information processing capabilities will fragment and disappear into everyday objects and the living environment. At present there are sharp dislocations in information processing capability – the computer on a desk, the PDA/laptop, mobile phone, smart cards and smart appliances. In our vision of Speckled Computing, the sensing and processing of information will be highly diffused – the person, the artefacts and the surrounding space, become, at the same time, computational resources and interfaces to those resources. Surfaces, walls, floors, ceilings, articles, and clothes, when sprayed or “speckled” with specks will be invested with a “computational aura” and sensitised post hoc as props for rich interactions with the computational resources."

I have absolutely no idea what that last sentence means in English, but it sounds like an interesting research goal.

Saturday, April 14, 2007

Top 10 Computer Jobs to Get Offshored

A Princeton economist has published a study showing how many jobs he considers to be at risk of being offshored over the next 10 years. The following table (extracted by me) shows his ranking of computer-related categories having a high chance of being offshored (shown as a percentage):

You can interpret "computer system analyst" (an accurate but slightly archaic term in row 4) as performance analyst or capacity planner.

Links to his report, as well as another by the ACM, are available at the Pulse.

Tuesday, March 20, 2007

Overview of Virtualization Performance

As the Guest Editor for this month's MeasureIT e-zine on the topic of virtualization, a compliation of articles is presented from both earlier MeasureIT authors as well as some papers from the CMG conference proceedings. Titles include:

Visualizing Virtualization

It May Be Virtual - But the Overhead is Not

A Realistic Assessment of the Performance of Windows Guest Virtual Machines

Measuring CPU Time from Hyper-Threading Enabled Intel Processors

Hyperthreading - Two for the Price of One?

To V or Not to V: A Practical Guide To Virtualization

The Virtualization Spectrum from Hyperthreads to GRIDs

This issue of MeasureIT is unique in my mind because it is rare to find, in one place, such a broad collection of performance perspectives centered on the intensely hot topic of virtualization.

Friday, March 16, 2007

Pushing the Wrong End of the Performance Pineapple

In case you're wondering why you, as a performance analyst or product manager, don't get much traction in your shop when you try to proselytize the otherwise highly rational notion of designing performance into the product (as opposed to patching it in after it has been released), contemplate this:

The performance of the production process trumps the performance of the product it produces.

If you don't read and heed this concept, you are going to find yourself perpetually frustrated. What I mean by this is, that we (performance weenies) are focused on the speeds and feeds of the bits and bytes associated with the technology being produced, whereas most companies these days are forced by Wall Street to be more focused on the performance and cost of their internal management processes.

I said the same thing on page XX of the Preface to my 1998 book, The Practical Performance Analysts:

"Management no longer focuses on the speed of the product as much as the speed of producing the product. Nowadays, production performance matters more than product performance."

I believe this is truer today than it was 10 years ago. Here are some reasons why:

Many hi-tech companies have offshored their engineering while keeping middle and upper management local. Latest example: Google goes to Poland

There are substantial tax incentives for USA companies to go offshore, and then ask Congress for more H-1B visas

There is a financial incentive to charge the customer (possibly you) for performance upgrades

The Dilbertization of the IT workplace

Is it any wonder then, that you don't get a warm reception from upper management? The odds are stacked against you, and the stack goes all the way back to Wall St. Don't even think about fighting that war. The only battles worth fighting, in my view, are the ones that employ guerrilla-style tactics.

Ironically, as I explain in my classes, this is where performance analysis really started, viz., with Frederic Winslow Taylor, the original performance analyst (anal-ist?) who introduced "time and motion" studies on human production in assembly lines and office environments of the early 1920s.

Oh, in case you're wondering about the title, it's an Aussie-ism. There is no good or smooth end of a pineapple.

Tuesday, March 13, 2007

PDQ geht Deutsch

PDQ goes German! Well, perhaps PDQ gets discussed in German, is a slightly more accurate description. The latest issue of Linux Technical Review (a German publication) discusses performance monitoring, in general, and how Perl::PDQ, in particular, can be used to go beyond monitoring to performance prediction, without breaking the bank.

The article is entitled "Berechenbare Performance" and the Abstract says: Wo man mit Statistik allein nicht weiterkommt, erlaubt es die Performance-Simulation beliebige Was-wäre-wenn-Szenarien durchzuspielen. Ein Perl-Modul hält den Aufwand in Grenzen.

How's your German? Kaput? Maybe Google can help? The appearance of the word 'simulation' in the Abstract is inaccurate, but maybe something got lost in translation.

PDQ Version 4.2 Released

PDQ (Pretty Damn Quick) finally made it out the door as version 4.2 and is now available for immediate download. The PDQ models included in the /examples/ directory correspond to those discussed in each of my books, but PDQ is primarily associated with the Perl::PDQ book.

The main features of PDQ 4.2 are:

Java version of PDQ contributed by Peter Harding in Australia

PHP version of PDQ contributed by Italian student Samuel Zallocco

Threaded-server model and errata corrections contributed by Neil Gunther

Better organization of the Perl and Python models

The Java and PHP packages are released stand-alone at this time

Complete installation instructions are available on the download page. Make sure you also read the various README files. Tom Becker (USA) and Christof Schmalenbach (Germany) have kindly provided separate installation notes for ActivePerl on Windows. This also indicates how international PDQ has become.

If you would like to be notified by email about future releases, please fill out this form or subscribe to this blog.

Tuesday, March 6, 2007

Hotsos 2007 Sizzled!

Just returned from Dallas where I was an invited speaker at the Hotsos 2007 Symposium on ORACLE performance. This symposium was a class operation: great hotel, great people, great food, great presentations, etc. and, as a newbie, I was treated very well. It seems that Cary Milsap (the energy behind the symposium) had already greased the runway for me, so I found myself to be a known quantity to many of the attendees, even though I had never met them before. This was way cool (Thanks, Cary).

Although ostensibly a group of very enthusiastic ORACLE performance people (about 450 attendees), they are not bigots, so they are interested in all aspects of performance. Moreover, Oracle performance gets critiqued. Capacity planning is one aspect that is new for many of them and I was a member of a panel session on that topic. During the 24 hours I was there, I attended a very interesting session on the measured limitations of RAC 10g scalability for parallel loads (ETL) and queries against a large-scale data warehouse (DWH), and a talk on how data skew can impact the kind of performance conclusions you tend to draw.

But perhaps the most interesting things that I learnt came out of several spontaneous discussions I had with various folks, including some conversations that went into the wee hours of Monday morning. My only regret was that I couldn't stay longer. I definitely plan to attend Hotsos 2008.

Saturday, March 3, 2007

More Visualization Toys

Here are some more visual paradigms that might find their way into performance visualization tools:

Computer system visuals

Digg Labs swarm view of blog tracing

Wall St. treemap

Google-based newsmap

Dynamic treemaps

Hiperwall

Map of the Internet

Hyperbolic fisheye view

Thursday, March 1, 2007

Disk Storage Myth Busters

Interesting myth-busting synopsis on disk drive technologies entitled: "Everything You Know About Disks Is Wrong" over at StorageMojo. Some of the key myths busted by extensive research at Google and CMU include:

Disk drives have a field failure rate 2 to 4 times greater than shown in vendors specs.

Reliability of both cheap and expensive drives is comparable.

Disk drive failures are highly correlated, thus violating one of the assumptions behind RAID Level-5.

Disk drive failure rates rise steadily with age rather than following the so-called bathtub function.

Storage vendors such as NetApp and EMC have responded. David Morgenstern asks in eWeek, why would anyone have trusted the MTBFs that appear in vendor glossies in the first place?

Background on failure rate analysis can be found in Chapter 1 of my Perl::PDQ book entitled: Time - The Zeroth Performance Metric.

Tuesday, February 27, 2007

Moore's Law II: More or Less?

For the past few years, Intel, AMD, IBM, Sun, et al., have been promoting the concept of multicores i.e., no more single CPUs. A month ago, however, Intel and IBM made a joint announcement that they will produce single CPU parts using 45 nanometer (nm) technology. Intel says it is converting all its fab lines and will produce 45 nm parts (code named "penryn") by the end of this year. What's going on here?

We fell off the Moore's law curve, not because photolithography collided with limitations due to quantum physics or anything else exotic, but more mudanely because it ran into a largely unanticipated thermodynamic barrier. In other words, Moore's law was stopped dead in its tracks by old-fashioned 19th century physics.

CMG 2007 Hot Topics Call

I am the Session Area Chairperson for Hot Topics at CMG 2007 this year. Proposed topics include, but are not limited to: SOA, Web Services, Virtualization, RFID, Server Consolidation, Gaming performance, Blade Servers, Grid Computing, Clustering, Performance visualization, and Emerging Technologies.

If you have a hot topic you'd like to present or know of someone else that might, please let me know about it either by posting here or contacting me via email. Thank you.

Monday, February 26, 2007

Helping Amazon's Mechanical Turk Search for Jim Gray's Yacht

Regrettably, Jim Gray is still missing, but I thought Amazon.com deserved more kudos than they got in the press for their extraordinary effort to help in the search for Gray's yacht. Google got a lot of press coverage for talking up the idea of using satellite image sources, but Amazon did it. Why is that? One reason is that Amazon has a lot of experience operating and maintaining very large distributed databases (VLDBs). Another reason is that it's not just Google that has been developing interesting Internet tools. Amazon (somewhat quitely, by comparison) has also developed their own Internet tools, like the Mechanical Turk. These two strengths combined at Amazon and enabled them to load a huge number of satellite images of the Pacific into the Turk database, thereby facilitating anyone (many eyes) to scan them via the Turk interface, and all that on very short order. Jim would be impressed.

I spent several hours on Sunday, Feb 4th, using Amazon's Mechanical Turk to help look for Gray's yacht. The images above (here about one quarter the size displayed by the Turk) show one example where I thought there might have been an interesting object; possibly a yacht. Image A is captured by the satellite at a short time before image B (t₁ < t₂). You can think of the satellite as sweeping down this page. Things like whitecaps on the ocean surface are going to tend dissipate and thus change pixels between successive frames, whereas a solid object like a ship will tend to remain invariant. The red circle (which I added) marks such a stable set of pixels which also have approximately the correct dimensions for a yacht i.e., about 10 pixels long (as explained by Amazon). Unfortunately, what appears to be an interesting object here has not led to the discovery of Gray's whereabouts.

Use of the Turk satellite images was hampered by a lack of any way to reference the images (about 5 per web page) by number, and there was no coordinate system within each image to express the location of any interesting objects. These limitations could have led to ambiguities in the follow up human expert search of flagged images. However, given that time was of the essence for any possible rescue effort, omitting these niceties was a completely understandable decision.

Sunday, February 25, 2007

PDQ e-Biz Code

The complete Perl code for the 3-tier e-commerce example described in Chapter 10 of my Perl::PDQ book is as follows.

ITIL and Beyond in 2008

ITIL for Guerrillas is the title of Chapter 2 in my new book Guerrilla Capacity Planning (GCaP). That chapter attempts to give some perspective on where GCaP methodologies fit into the ITIL framework.

I am thinking about presenting a new Guerrilla class on this topic for 2008, which would go well beyond the material in chapter 2 to compare what ITIL offers with what is actually needed to provide proper IT service and capacity planning. I'm working with Steve Jenkin, an IT consultant in Australia, who is currently being ITIL certified. Check out his blog ITIL Utopia - Or Not? as he develops his own unique perspective on ITIL.

Please post your comments and suggestions here so we can get some idea of the level of interest in this topic (possible title: Going Guerrilla on ITIL). Assuming there is interest, I will provide more information about the course content and requirements, as things progress.

PDQ Version 4.0

Version 4.0 of PDQ (Pretty Damn Quick) is in the pipeline---it's been there for quite some time, actually (blush). The current hold up is related to getting both the Perl and the new Java version through the QA suite designed by Peter Harding. As soon as that is sorted out, we'll release it; probably in two pieces, keeping the Java PDQ separate initially. Also included will be updates to PyDQ (python) and a new PHP version.

If you would like to be notified by email, please fill out this form.

Guerrilla Certification?

During some recent discussions with a large TelCo client, the issue of certification for the various Guerrilla training classes came up. If I understood correctly, the idea would be to augment each class with some kind of ranking (e.g., Levels I, II, III) to denote the proficiency level achieved by a student taking a particular class. This would be useful to managers who would like to better categorize the level of competency of each employee they send to Guerrilla training. The Guerrilla classes are not currently organized along those lines, but they could be.

There are some possible complicating factors that could creep in. Questions that immediately spring to mind are:

Would the Guerrilla levels just be an internal ranking provided by Performance Dynamics?

Is there any need to have such levels certified by an outside institution e.g., a university?

Is there a need to have such levels associated with continuing education (CEU) credits?

I would be interested to hear from both managers and Guerrilla alumni on this idea.

Virtualization Spectrum

My CMG 2006 paper on virtualization was recently blogged at HP Labs in the context of hyperthreading being considered harmful to processor performance. The paper actually provides a general unified framework in which to understand hyperthreading, hypervisors (e.g., VMware, and Xen), and hyperservices (e.g., P2P virtual networks like
BitTorrent); the latter being an outgrowth of something I wrote in response to an online analysis of Gnutella.

The VM-spectrum concept is based on my observations that: (i) disparate types of virtual machines lie on a discrete spectrum bounded by hyperthreading at one extreme and hyperservices at the other, and (ii) poll-based scheduling is the common architectural element in most VM implementations. The associated polling frequency (from GHz to μHz) positions each VM into a region of the VM-spectrum. Several case studies are analyzed to illustrate how this framework could make VMs more visible to performance management tools.

Performance Visualization

Perf viz for computers (PVC) is both an abiding interest and a pet peeve of mine. I first wrote about this topic in a 1992 paper entitled: "On the Application of Barycentric Coordinates to the Prompt and Visually Efficient Display of Multiprocessor Performance Data." Along with that paper, I also produced what IMHO is an interesting example of an efficient performance tool for the visualization of mulitprocessor CPU utilization called barry (for barycentric or triangular coordinates---get it?), written in C using the ncurses library.

RESEARCH: My role models for PVC are scientific visualization and statistical data visualization. But why should physicists and statisticians have all the fun!? I maintain that similar CHI techniques could be applied to help people do a better job of performancs analysis and capacity planning. I consider a central goal to be to find visual paradigms that offer the best impedance match b/w the digital-electronic computer under test and the cognitive computer doing the analysis (aka your brain). This is a very deep problem because we have a relatively poor understanding of how our visual system works (both the optical and the neural circuits), although this is improving all the time. So it's very difficult to quantify "best match" in general, and even more difficult to quantify it for individuals. One thing we do know is, that vision is based on neural computation; which also explains why the moon looks bigger on the horizion than when you when take a photograph of it in that same location. (How then was the following shot taken?)

Scalability Parameters

In a recent Guerrilla training class given at Capital Group in Los Angeles, Denny Chen (a GCaP alumnus) suggested a way to prove my Conjecture 4.1 (p.65) that the two parameters α and β are both necessary and sufficient for the scalability model:

C(N) =

1 + αN + βN (N − 1)

developed in Section 4.4 of the Guerrilla Capacity Planning book.

Basically, Denny observes that 2 parameters (a and b) are needed to define an extremum in a quadratic function (e.g., a parabola passing through the origin with c = 0), so a similar constraint should hold (somehow) for a rational function with a quadratic denominator. This is both plausible and cool. I don't know why I didn't think of it myself.