Showing posts with label SPEC. Show all posts
Showing posts with label SPEC. Show all posts

Sunday, January 22, 2012

Throughput-Delay Curves

A colleague of mine at Yahoo.com asked me if I'd ever seen curves like this:

Not only is the answer, yes (it's a throughput-delay plot or XR plot in my notation), but that particular plot comes from my GCaP course notes. There, I use it to analyze the comparative performance of a functional multiprocessor (NS6000) and a symmetric multiprocessor (SC2000). Note how the two curves cross at around 1500 OPS. You can ask yourself why and if you can't come up with an explanation, you should be registering for a Guerrilla class. :)

The above XR plot also serves as a useful reminder that the throughput and response-time metrics are not only dependent on one another, but they are generally dependent in a nonlinear way—despite what some experts may claim:

Monday, September 14, 2009

Anti Log Plots

A sure sign that somebody doesn't know what they're doing is, when they plot their data on logarithmic axes. Logarithmic plots are almost always the wrong thing to use. The motivation to use log axes often arises from misguided aesthetic considerations, rather than any attempt to enhance technical understanding, e.g., trying to "compress" a lot of data points into the available plot width on a page. The temptation to use log plots is also too easily facilitated by tools like Excel, where re-scaling the plot axes is just a button-click away.

Thursday, April 2, 2009

Modern Microprocessor MIPS

The question of how modern microprocessors compare with mainframe processors of yore, arises from time to time. The vernacular rate metric that has persisted for a long time (long in the history of computers, that is) is MIPS. Whether you approve of MIPS as a valid performance metric or not is a different (philosophical) question. Since the mainframe has not gone away---it's just another server on the network today---even mainframers still talk about MIPS ratings. Nonetheless, it is true that the meaning of "instructions" does vary significantly across architectures so, one does have to exercise caution when making inter-architectural comparisons and not endow any conclusions with more credibility than they deserve.

Wednesday, November 19, 2008

What the Harmonic Mean Means

As I discuss in Chapter 1 of The Practical Performance Analyst, time is the fundamental performance metric. Computer system performance metrics are therefore either direct measures of time, e.g, seconds, hours, minutes, or they are rates. All rate metrics have their units of time in the denominator, e.g., GB/s, MIPS, TPS, IOPS.

A conceptual difficulty can arise when we try to summarize a set of performance numbers as a single number; especially if they're rates.

Tuesday, March 25, 2008

Hickory, Dickory, Dock. The Mouse Just Sniffed at the Clock

Following the arrival of Penryn on schedule, Intel has now announced its "tock" phase (Nehalem) successor to the current 45 nm "tick" phase (Penryn). This is all about more cores (up to 8) and the return of 2-threads per core (SMT), not increasing clock speeds. That game does seem to be over, for example:

  • Tick: Penryn XE, Core 2 Extreme X9000 45 nm @ 2.8 GHz
  • Tock: Bloomfield, Nehalem micro-architecture 45 nm @ 3.0 GHz

Note that Sun is already shipping 8 cores × 8 threads = 64 VPUs @ 1.4 GHz in its UltraSPARC T2.

Nehalem also signals the replacement of Intel's aging frontside bus architecture by its new QuickPath chip-to-chip interconnect; the long overdue competitor to AMD’s HyperTransport bus. A 4-core Nehalem processor will have three DDR3 channels and four QPI links.

What about performance benchmarks besides those previously mentioned? I have no patience with bogus SPECxx_rate benchmarks which simply run multiple instances of a single-threaded benchmark. Customers should be demanding that vendors run the SPEC SDM to get a more reasonable assessment of scalability. The TPC-C benchmark results are perhaps a little more revealing. Here's a sample:

  • A HP Proliant DL380 G5 server 3.16GHz
    2 CPU × 4 cores × 1 threads/core = 8 VPU
    Pulled 273,666 tpmC on Oracle Enterprise Linux running Oracle 10g RDBMS (11/09/07)

  • HP ProLiant ML370G5 Intel X5460 3.16GHz
    2 CPU × 4 cores × 1 threads/core = 8 VPU
    Pulled 275,149 tpmC running SQL Server 2005 on Windows Server 2003 O/S (01/07/08)

  • IBM eServer xSeries 460 4P
    Intel Dual-Core Xeon Processor 7040 - 3.0 GHz
    2 CPU × 4 cores × 2 threads/core = 16 VPU
    Pulled 273,520 tpmC running DB2 on Windows Server 2003 O/S (05/01/06)

Roughly speaking, within this grouping, the 8-way Penryn TPC-C performance now matches a 16-way Xeon of 2 years ago. Note that the TPC-C Top Ten results, headed up by the HP Integrity Superdome-Itanium2/1.6GHz at 64 CPUs × 2 cores × 2 threads/core = 256 VPUs, are in the 1-4 million tpmC range.

The next step down is from 45 nm to 32 nm technology (code named Westmere), which was originally scheduled for 2013. Can't accuse Intel of not being aggressive.