Comments on The Pith of Performance: My Year in Review 2011

Sounds like we agree. No little green men in sight...

2012-02-05T17:53:57.480-08:00

Sounds like we agree. No little green men in sight!

Baron, It seems to me you want it both ways: "...

2012-02-04T09:35:20.981-08:00

Baron, It seems to me you want it both ways: "I see this kind of thing all the time in the real world," but you don't provide data because "I'm just really busy."

With data, we don't need thought experiments. Your example, btw, is pretty much the canonical example of how superlinear effects arise.

Neil, I'm not trying to avoid accountability,...

2012-01-16T18:56:19.935-08:00

Neil,

I'm not trying to avoid accountability, I'm just really busy. However, it seems pretty clear to me that the USL isn't adequate to model nonlinear cache-dependent behavior. Here is a simple thought experiment. You have a clustered system with a readonly workload, where each node contains 10GB of RAM. The total dataset is 50GB, and is uniformly accessed (for simplicity, but in another paragraph or two I could extend this to a realistically accessed dataset; it just gets confusing in an unhelpful way that doesn't change anything material). The data is stored on some kind of hard drives; doesn't really matter what. The only thing that matters is that the drives are much slower than RAM.

Now, after the system warms up, a one-node cluster will have a 20% cache hit rate. 4/5ths of the accesses will go to disk.

Let's add another node and let the cluster warm up. A smartly designed clustering system explicitly creates locality of cache reference: operations on half the dataset goes to one node, and the others go to the other node. The data may even be partitioned instead of duplicated; it doesn't matter. We now have each node effectively dealing with half the data, so the cache hit rate has improved nonlinearly, and you can easily calculate what happens to the system throughput as a result.

Add a few more nodes to the system, and the workload becomes entirely in-memory, and I/O is nonexistent. Suddenly each node is performing a few orders of magnitude faster than a single node did. We multiplied the node count by 5, and got 1000-fold improvements.

I see this kind of thing all the time in the real world.

There is no violation of the laws of physics, no UFO, no green man. The USL isn't the complete story here.

To define scalability on such a system, we have to hold a lot of things constant. The USL says we hold the workload per node constant, add nodes, and see what happens. But "workload" is much more than just number of client programs. In addition to that, we need to hold the data per node constant, so we need to scale the data size. But we also need to look at query complexity, because the interrelationships between bits of data commonly grow nonlinearly too; so in the end it becomes very complex to understand what constitutes an equal workload per node as the cluster grows.

If the USL is capable of modeling this, I would like to understand how, but I frankly don't think it is. The USL is a simpler model, just as Newton's laws are a pretty close approximation as long as you don't move very fast.

I don't think you need real datasets and benchmark results to agree that this is possible and doesn't represent bad data, bad measurements, etc. I'm frankly quite surprised you haven't run into this type of thing yourself.

I'm not arguing that a single node can magically do more work than it can do, and really scale superlinearly, or that we're getting something for nothing; I'm saying that our definition of linear is not adequate.

Baron: It all starts and ends with data, but trea...

2012-01-16T14:16:07.513-08:00

Baron: It all starts and ends with data, but treating data as divine is a sin.

Don't get me wrong, however. I'm not saying that the so-called "superlinear" effect is not observed or that you or other people are making it up. People see all sorts of things in data. People also see UFOs and often have the video to "prove" it.

On the other hand, immediately jumping to the conclusion that aberrant data trumps otherwise successful models is like believing UFOs not only exist but are flown by little green men.

Presenting the data (especially in public commentary) is vital to make your case, either way. Otherwise, it's like saying you've seen a UFO without even having a video on YouTube; a mere hit-and-run tactic.

There is a nice role model for this situation in particle physics at the moment. You've probably seen the incessantly repeated reports that Einstein is toast b/c superluminal neutrinos have apparently been observed (twice) in the CERN/OPERA detector.

Although the press and bloggers were all over this (and CERN probably didn't mind the publicity) the physics team involved, to their credit, presented their data both aurally and in a peer-reviewed journal. And their data can't be dismissed easily b/c it's accurate to 6-sigma (i.e., 1 in a billion chance that it's a fluke). Basically, they are saying: "Help!" this is what our data reveals when we interpret it, but we can't explain why it contradicts more broadly tested and known physics. They did not push the idea that SRT is dead; that was bloggers.

The jury is still out on OPERA (I believe) but I'm putting my money on Albert. I'm betting there's a bug in the measurement apparatus or the way the data analysis is being executed. Because of the immense complexity in their measurements, however, it willl probably take a long time to sort them out.

Thankfully, superlinearity is a little bit simpler than superluminality. :)

Dear Unknown: Nice question and curiously enough i...

2012-01-16T13:26:30.403-08:00

Dear Unknown: Nice question and curiously enough it dovetails with item 8 in my list, although I'm not about to go into that here.

Nevertheless, w/o going into a lot of detail, your question can be approached from a queue-theoretic point of view in the following way.

Think of two extremes in time scales. (1) If DB locking is very "short," you don't care (unless it gets longer in the future) and you wouldn't be asking your question. (2) If it's very "long," more than likely something is seriously out of whack or possibly even pathologically broken. In that case there's no point considering performance until the problem is renormalized. A good DBA is worth their weight in gold to help solve problems like that.

For more intermediate cases, the locking can be accommodated as part of the service time. Moreover, you could make the service time load-dependent so as to observe where it becomes a scalability issue that needs to be addressed. See Chap. 12 of my Perl::PDQ book for more on that.

Keep in mind that queueing models see system performance in a time-averaged way, rather the instantaneous view you get when you're staring at perf stats collected by whatever O/S or DBMS tools. Think of that time-averaged view as a kind of "flow" and if you're not flowing, you need to explain why not.

Neil, I would agree very strongly with your state...

2012-01-16T12:20:23.419-08:00

Neil, I would agree very strongly with your statements about making the model simple rather than including kitchen sink. Part of my problem has been to how to include the software locking processes that appear to be involved in the DB system and how to know when to add that level of detail. That is often needed to explain why the real world does not scale like the model. There are other artificial limiting factors that are not known at first.

In reading over your blog, it would appear you have been all over the map. Most of all, it sounds like you are still having fun with your work.

I'm honored to have become a link in your list...

2012-01-05T05:33:34.232-08:00

I'm honored to have become a link in your list. I can provide supporting data if you like, but I doubt you need it. I look forward to learning more about how the USL can explain this effect, because I haven't thought of any way to model it without increasing the complexity of the USL by adding terms, and its simplicity is what makes it so powerful in the "real world" for me.

In the brevity of the statement (and not wanting t...

2012-01-01T19:32:56.255-08:00

In the brevity of the statement (and not wanting to give too much away *grin*), I didn't intend to convey that there would be nothing left for the diligent modeler, such as yourself, to do.

Rather, let's say the "automagic" bit would get you to the first-order model, which you could then tweak to your heart's content. The biggest barrier-to-entry for most people is knowing where to start at all. And usually, the start (too) big.

WIthout naming names, I just went through a group exercise in a recent training class where the initial PDQ model was declared (by those in the know, not me) to contain everything including the proverbial kitchen utensils. This went on for some time and nothing seemed to jive. Then, I took over and showed how, from a performance perspective, the initial model inevitably got winnowed down to about 3 queues whereupon it was immediately observed to align with the available performance data. My objective, on the other hand, was not necessarily to simply (per se) but to reach a point where the various queueing metrics became consistent.

Instead of building down, we could build up.

#8 would be most handy indeed! But I have to say t...

2012-01-01T19:06:08.877-08:00

#8 would be most handy indeed! But I have to say that I've learned more from building the PDQ models by hand than I ever would have if it had been an automagical system.

PyDQ has been an important part of my 2011 and will be so in 2012 as well.