Tuesday, February 27, 2007

Moore's Law II: More or Less?

For the past few years, Intel, AMD, IBM, Sun, et al., have been promoting the concept of multicores i.e., no more single CPUs. A month ago, however, Intel and IBM made a joint announcement that they will produce single CPU parts using 45 nanometer (nm) technology. Intel says it is converting all its fab lines and will produce 45 nm parts (code named "penryn") by the end of this year. What's going on here?

We fell off the Moore's law curve, not because photolithography collided with limitations due to quantum physics or anything else exotic, but more mudanely because it ran into a largely unanticipated thermodynamic barrier. In other words, Moore's law was stopped dead in its tracks by old-fashioned 19th century physics.

Computing at the Speed of Heat

As CMOS feature sizes are reduced and switching speed increased, thermal power dissipation becomes a serious problem because it is directly proportional to the clock or chip frequency. A 3.6 GHz Pentium 4 (Prescott) can generate on the order of 100 Watts—equivalent in power to a large household light bulb. And don't forget the huge power transients at the pads, which are essentially scale-invariant. Moreover, you end up trying to push that heat through a pinhole--the small die size. In other words it's really power density that kills you. A major consequence is that chip failure rates increase dramatically due to thermal degradation. The Apple Mac G5 dual processor (IBM Power 5 chip) has a freon cooling system (a la Cray) together with a spectrophotometer that detects freon leaks. If such a leak is detected, the system shuts down immediately. This is another reason Apple went Intel.

As I understand it, the thermal problem is the main reason why multicore has been promoted by the CPU vendors. Keep the cores running at about the same clock speeds as currently available, i.e., admit that Moore's law has been thermally defeated, and compensate for the lack of higher clock frequency, which lazy programmers have become used to, by adding more aggregate MIPS via more cores per module. Of course, this looks good on paper, but it comes with it's own set of problems, which those of us who have been involved with the development of SMPs, have seen before. Welcome to the return of concurrent programming as a major performance issue for applications in the foreseeable future. Been there, done that 15 years ago. This time it's worse, because the multicores are a true hardware black box. Therefore, any serious performance tuning will have to be accomplished in software---mostly at the application level, I'm guessing.

M is for Metal

In an ironic twist of fate, the new CMOS transistor technology actually hearkens back to the earliest transistor implementations. When I was involved with VLSI design tools at Xerox PARC, we used the Mead-Conway design rule: "poly over silicon" produces a transistor. The word poly refers to polysilicon or amorphous Si. The Mead-Conway rule was actually shorthand for a layer of poly-Si over an implied silicon dioxide insulator over a doped, over a Si substrate with implied source and drain, leads to a transistor being produced from the photolithographic masks generated by the VSLI CAD tool. But the acronym CMOS means "complimentary metal oxide semiconductor", which is a throwback to the days when the transistor gates were made by depositing metal (Aluminum, usually) rather than poly-Si over the Si substrate. Ironically, the solution to gate leakage at 45 nm is to go back to metal-based gates. The claims for the new Hafnium oxide-metal gate technology include:
  • ~2x improvement in transistor density, for either
    smaller chip size or increased transistor count
  • ~30% reduction in transistor switching power
  • >20% improvement in transistor switching speed or >5x
    reduction in source-drain leakage power
  • >10x reduction in gate oxide leakage power

It remains to be seen whether these attributes really translate into the so-called Moore's law II curve and lead to the production of faster, cooler single CPU parts or just somewhat faster and somewhat cooler multicores.

Updates of July 24, 2010
  • Added derivation of the dependency between CMOS power and switching frequency
  • An expanded version of this blog post appears in the May 2007 issue of MeasureIT
  • "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software," Herb Sutter, Dr. Dobb's Journal, March 2005 (updated online August 2009)
  • "The Trouble With Multicore: Chipmakers are busy designing microprocessors that most programmers can't handle," Dave Patterson, IEEE Spectrum, July 2010
  • Intel Core i7-880, [Lynnfield (45 nm process)] with a clock frequency of 3 GHz, dissipates up to 95 Watts
  • If you really want your CMOS processor to run faster, pour liquid nitrogen over it
  • Tim Bray, co-inventor of XML pushes functional programming for concurrent programming on multicores, e.g., Erlang and Clojure


SteveJ said...

You couldn't be more right!!

See http://itilopia.blogspot.com/2007/03/end-of-silicon-revolution.html for some related comments.

Neil Gunther said...

Added a derivation of the dependency between CMOS power and switching frequency (the reason for multicore) and also included some updated links to related material. See list at end.