The Pith of Performance: Modern Microprocessor MIPS

The question of how modern microprocessors compare with mainframe processors of yore, arises from time to time. The vernacular rate metric that has persisted for a long time (long in the history of computers, that is) is MIPS. Whether you approve of MIPS as a valid performance metric or not is a different (philosophical) question. Since the mainframe has not gone away---it's just another server on the network today---even mainframers still talk about MIPS ratings. Nonetheless, it is true that the meaning of "instructions" does vary significantly across architectures so, one does have to exercise caution when making inter-architectural comparisons and not endow any conclusions with more credibility than they deserve.

With that caveat in mind, estimating MIPS ratings can be very useful in certain circumstances. One such circumstance arose recently for a colleague at HP Labs, where they were trying to ascertain "ballpark" MIPS for a processor being used in one of their projects. To avoid divulging any proprietary information, let's suppose that the modern microprocessor of interest was a 3.0 GHz Xeon Dual Core, such as can be found in an HP ProLiant ML350 server. Their original estimate was 20,000 MIPS. You can find support for this kind of estimate on the web (see e.g., this wiki table). But is it believable? Fortunately, my colleague asked me to sanity check their estimate.

The key to resolving this question is to find a bridge (or bridges) between IBM mainframe MIPS and the throughput metrics used for modern microprocessors. In Chap. 7 of The Practical Performance Analyst, I describe how I sized an Amdahl 5995-1400 series mainframe against a Unix SMP box with up to 24-way MIPS (the company) R3000 microprocessors. To do this, I found a TPS bridge between IBM's LSPR MIPS ratings and throughput measurements from an TP1 database benchmark running ORACLE. Luckily, I was able do this because the Amdahl mainframe also ran a version of Unix (UTS) and therefore it could also run Oracle's RDBMS. This meant the workloads were the same, to first order. The conclusion there, as I remember it, was something like the R3000 was equivalent to only one-third of an IBM MIP for TP1 database transactions. The reason for that stems from mainframes having separate (channel) processors to handle physical I/O to DASD disks, whereas a Unix SMP is limited to the same processors for executing both user transactions and the device-driver I/Os (kernel code).

Things are less ambiguous if the workload is compute-intensive. The one clear accomplishment of the SPEC benchmark organization is, they have redefined nominal "MIPS" for modern microprocessors (whether or not that was their intention). The relevant industry-standard microprocessor benchmarks are the SPEC CINT (integer-only CPU workloads) and SPEC CFP (floating-point CPU workloads). The current benchmark release is collectively known as SPEC CPU2006.

The unfortunate downside is, that up until their 2006 release, it was possible to compare IBM MIPS, Vax VUPs and SPEC CINTs, because they were very similar, to first order. For example, MIPS Corporation rated the R3000 @ 40 MHz = 32 VUPS. The SPEC CINT1992 rating was about 27; very close to the VUPS value. Amdahl Corporation rated the 5995-1400 @ 100 MHz = 120 IBM MIPS. But the Amdahl mainframe was a 4-way SMP. Therefore, each CPU-engine was approximately 30 IBM MIPS and that matches the VUPS rating numerically for cpu-bound workloads. A simple check we can do at this point is simply to wind up the clock (theoretically) to estimate the SPEC rating for the R3000 RISC processor @ 3 GHz; the same clock frequency as the Intel Dual Core processor. Doing this in R, we find:


> r<-uniroot(function(x) (40/3000)-(32/x),c(0,5000))
> r$root
[1] 2400

which suggests that the SPEC rating corresponds to something on the order of 2,000 MIPS; not 20,000 MIPS.

Without getting too sidetracked by all these details, here is a table of SPEC CPU2000 (the previous benchmark release) with CINT metrics for some Intel single and dual core microprocessors:

If we take it as read that the CINT numbers are close to older MIPS ratings, we get a picture that looks like this:

In summary, we can conclude:

The original guess of 20,000 MIPS appears to be off by an order of magnitude.
Mainframe and CISC processors take fewer instructions to do the same work as a RISC processor.
If the workload is entirely compute-bound, we may assume IBM MIPS ~ Vax VUPS ~ SPEC CINT2000, to first order.
If the workload involves physical I/O, then MIPS estimates should be reduced by factor of 1/2 to 1/3.
A 3 GHz core can be safely assumed to be equivalent to 1000-1500 MIPS.
Therefore, a 3 GHz dual-core should be roughly equivalent to 2000-3000 MIPS.

What are the important lessons one can take away from this exercise?

MIPS, shmips! It doesn't really matter what you call it. It's vernacular for processor throughput performance. It's not going away, so learn to deal with it.
Trying to come up with some estimate (even if it turns out to be wrong) is better than not having any clue at all.
Without an estimate, how are you going to know if you are wrong?
Numbers are not the only way, or even the best way, to make estimates. It is often more insightful to use a combination of numerical and graphical techniques.

The Pith of Performance

Thursday, April 2, 2009

Modern Microprocessor MIPS

No comments: