The Pith of Performance: June 2007

Thursday, June 28, 2007

Unnatural Numbers

What is the next number on the sequence: 180, 90, 45, ... ? Since each number appears to be obtained by halving the previous number in the sequence, it should be 22.5. This shows that even if you start randomly with a large natural number (which is not a power of 2) and you keep halving the successive values, you will reach an odd natural number and when you halve that, you end up with a fraction (a real number). Here's a little Python code that demonstrates the point:

#!/usr/bin/python
import random

r = random.randrange(100,1000,2)
print r

while True:
r /= 2
print r 
if r % 2 != 0: break

Run it several times and you will see that the sequence, although generally not long, contains an unpredictable number of iterates. This is easily understood by noting that division by 2 is equivalent to a bitwise right-shift (i.e., >> operator in C, Python and Perl), so each successive division simply marches the bit-pattern to the right until a '1' appears in the unit column. That defines an odd number and another right-shift produces '1' as a remainder, which I can either ignore as with integer division or carry as a fraction. It also means that you can predict when you will hit an odd number by first representing the starting number as binary digits, then counting the number of zeros from the rightmost position to the first occurrence of a 1-digit.

In the manufacture of microprocessors and memory chips, the increase in speed follows Moore's law, which is also a halving sequence like the above. Therefore, we see natural numbers like 90 nanometer (nm), 45 nm, and so on. For reference, the diameter of the HIV virus (a molecular machine) is about 100 nm. As I've discussed elsewhere, 45 nm is the next-generation technology that IBM, Intel and AMD will be using to fabricate their microprocessors. But the current smallest technology being used by AMD and Intel is not 90 nm, it's 65 nm. And the next generation after 45 nm will be 32 nm, not 20-something. So, where do these 'unnatural' numbers come from?

Each progression in shrinking the silicon-based photolithographic process is identified by a waypoint or node on a roadmap defined by the Semiconductor Industry Association or SIA. The current SIA roadmap looks like this:

Year  2004    2007    2010    2013    2016    2020 
nm     90      65      45      32      22      14

A technology node is defined primarily by the minimum metal pitch used on any product. Here, pitch refers to the spacing between the metal "wires" or interconnects. In a microprocessor, for example, the minimum metal pitch is the half-pitch of the first layer of metal used to connect the actual terminals of one transistor to another. Notice that IBM and Intel technology is moving faster than SIA expectations, so the roadmap already needs to be updated again.

Wednesday, June 20, 2007

Housekeeping My List of Pubs

I've been out of the office more than in, this week, so I have a lot of catching up to do. The only thing I've managed to achieve here is to update my list of publications.

Friday, June 15, 2007

Linux Instrumentation: Is Linus Part of the Problem?

I was interested to read in a LinuxWorld article entitled "File System, Power and Instrumentation: Can Linux Close Its Technical Gaps?", that Linus Torvalds believes the current kernel instrumentation sufficiently addresses real-world performance problems.

This statement would be laughable, if he weren't serious. Consider the following:

How can current Linux instrumentation be sufficient when older UNIX performance instrumentation is still not sufficient?
UNIX instrumentation was not introduced to solve real-world performance problems. It was a hack by and for kernel developers to monitor the performance impact of code changes in a light-weight O/S. We're still living with that legacy. It might've been necessary, but that doesn't make it sufficient.
The level of instrumentation in Linux (and UNIX-es) is not greatly different from what it was 30 years ago. As I discuss in Chap. 4 of my Perl::PDQ book, the idea of instrumenting an O/S goes back (at least) to c.1965 at MIT.
Last time I checked, this was the 21st century. By now, I would have expected (foolishly, it seems) to have at my fingertips, a common set of useful performance metrics, together with a common means for accessing them across all variants of UNIX and Linux.
Several attempts have been made to standardize UNIX performance instrumentation. One was called the Universal Measurement Architecture (UMA), and another was presented at CMG in 1999.
The UMA spec arrived DOA because the UNIX vendors, although they helped to design it, didn't see any ROI where there was no apparent demand from users/analysts. Analysts, on the other hand, didn't demand what they had not conceived was missing. Such a Mexican standoff was cheaper for the UNIX vendors. (How conveeeeenient!) This remains a potential opportunity for Linux, in my view.
Rich Pettit wrote a very thoughtful paper entitled "Formalizing Performance Metrics in Linux", which was resoundingly ignored by Linux developers, as far as I know.

Elsewhere, I've read that Linus would like to keep Linux "lean and mean". This reflects a naive PC mentality. When it comes to mid-range SMP servers, it is not possible to symmetrize the kernel without making the code paths longer. Longer code paths are necessary for control and scalability. That's a good thing. And improved performance instrumentation is needed a fortiori for the layered software architectures that support virtual machines. Since Linus is the primary gate-keeper of Linux, I can't help wondering if he is part of the solution or part of the problem.

Wednesday, June 13, 2007

Linux CFS: Completely Fair or Completely Fogged?

A while ago, I saw what looked like an interesting blog entry announcing a new task scheduler for Linux called CFS: Completely Fair Scheduler. I wanted to compare CFS with Fair Share scheduling (FSS); something that took off in the 1990's for UNIX operating systems and something I've looked into from a performance perspective, especially because FSS provides the fundamenal resource allocation mechanism for most VMM hypervisors.