"16 multicores perform barely as well as 2 for complex applications."
What they discovered in fact (using various algorithmic workload simulations), and the reason Paul sent me the link, is that multicore applications are subject to my universal scalability law (USL), precisely as discussed in Chaps 4--6 of my Guerrilla Capacity Planning book. Moreover, they are confronted with the worst case situation of retrograde throughput (i.e., α > 0, β > 0 in the USL) above 8 cores. Although the Sandia Press Release of January 13, 2009 does not explicitly show this retrograde effect on throughput scalability, it is present in a disguised form.
The bowl-shaped curve in their graph (above left) is actually a beautiful rendition of the theoretical plot (above right) discussed in Item 3 ("One-on-One Conversations") of my November 4, 2007 blog entry in which I show the connection between the USL and Brooks' law (AKA The Mythical Man-Month). The retrograde productivity curve (i.e., throughput) can be seen in blog Item 4 "Combined Productivity." The other "horizontal" curves in their graph would seem to be variants of the other USL cases: "Ideal Schedule Contraction" and "Round-Table Meetings," in that blog post.
The difference in approach comes from the fact that supercomputer jockeys worry about latency as the primary determinant of throughput. Click on their graph to enlarge it and you'll see that their y-axis is labeled "Seconds," i.e., time, not a rate. In Section 5 ("Extrema and Universality") of my more formal arXiv preprint, I prove that such a bowl curve in the latency is equivalent to retrograde throughput in the USL.