Tuesday, March 25, 2008

Hickory, Dickory, Dock. The Mouse Just Sniffed at the Clock

Following the arrival of Penryn on schedule, Intel has now announced its "tock" phase (Nehalem) successor to the current 45 nm "tick" phase (Penryn). This is all about more cores (up to 8) and the return of 2-threads per core (SMT), not increasing clock speeds. That game does seem to be over, for example:

  • Tick: Penryn XE, Core 2 Extreme X9000 45 nm @ 2.8 GHz
  • Tock: Bloomfield, Nehalem micro-architecture 45 nm @ 3.0 GHz

Note that Sun is already shipping 8 cores × 8 threads = 64 VPUs @ 1.4 GHz in its UltraSPARC T2.

Nehalem also signals the replacement of Intel's aging frontside bus architecture by its new QuickPath chip-to-chip interconnect; the long overdue competitor to AMD’s HyperTransport bus. A 4-core Nehalem processor will have three DDR3 channels and four QPI links.

What about performance benchmarks besides those previously mentioned? I have no patience with bogus SPECxx_rate benchmarks which simply run multiple instances of a single-threaded benchmark. Customers should be demanding that vendors run the SPEC SDM to get a more reasonable assessment of scalability. The TPC-C benchmark results are perhaps a little more revealing. Here's a sample:

  • A HP Proliant DL380 G5 server 3.16GHz
    2 CPU × 4 cores × 1 threads/core = 8 VPU
    Pulled 273,666 tpmC on Oracle Enterprise Linux running Oracle 10g RDBMS (11/09/07)

  • HP ProLiant ML370G5 Intel X5460 3.16GHz
    2 CPU × 4 cores × 1 threads/core = 8 VPU
    Pulled 275,149 tpmC running SQL Server 2005 on Windows Server 2003 O/S (01/07/08)

  • IBM eServer xSeries 460 4P
    Intel Dual-Core Xeon Processor 7040 - 3.0 GHz
    2 CPU × 4 cores × 2 threads/core = 16 VPU
    Pulled 273,520 tpmC running DB2 on Windows Server 2003 O/S (05/01/06)

Roughly speaking, within this grouping, the 8-way Penryn TPC-C performance now matches a 16-way Xeon of 2 years ago. Note that the TPC-C Top Ten results, headed up by the HP Integrity Superdome-Itanium2/1.6GHz at 64 CPUs × 2 cores × 2 threads/core = 256 VPUs, are in the 1-4 million tpmC range.

The next step down is from 45 nm to 32 nm technology (code named Westmere), which was originally scheduled for 2013. Can't accuse Intel of not being aggressive.

No comments: