The way my mind processes new technical information might best be described as applying rigorous intuition. Most people are comfortable if something simply makes intuitive sense to them: "Yeah, I get it!" That's necessary, but not sufficient for me; it also needs to fit with other things I know, in a rigorous way. Not only do I have to get it, I have to be able to put it! This processing can also make me appear very slow to people who don't know me well. They think I'm not getting it, but actually I'm probably having more trouble putting it. I must be wired up this way, because even my school teachers told my parents that I was slow and "not academic material." Today, with my background in theoretical physics, that rigor is usually encapsulated in mathematics because, for me, it's much easier to recall and understand math than a pile of words—that's why we create mathematical symbols. So, when someone tells me something important, it has to fit like a "brick" in my "wall of performance" knowledge. If it doesn't, I become confused and anxious, and I have to get to the bottom of it. Either the brick is faulty or my wall needs adjustment. I need to find out which it is.
This is what happened to me at slide 3 in Alistair's presentation. First, I want to say that I was impressed that he actually tried to formalize his point in a simple equation; most people don't even do that. Here's his formula expressed in slightly modified words:
Delay = Request rate / number of machines. | (1) |
I hadn't seen this formula before so, it looked like a candidate for another brick for my wall. It also seemed to make intuitive sense: More requests, longer delays. More machines to service those requests, shorter delays. But simultaneously I had a problem; it didn't fit into my existing wall. I could see immediately that this formula violates an iron law of performance:
Example:Delay is inversely proportional to throughput
If a freeway carries more cars per hour (throughput), your commute-time (delay) should be shorter, not longer. Freeways are funded and built on this principal.
The 'Request rate' in Alistair's equation is tantamount to a throughput—what goes in, must come out—so eqn.(1) can be restated as:
Delay = Throughput / number of machines. | (2) |
Expressed this way, it says that latency is proportional to throughput, which is not only in conflict with the iron law but it now seems counterintuitive. That's the trouble with relying solely on intuition.
That's where my mind was after Alistair's Velocity presentation, and quite honestly, I was so fixated on his formula that I probably missed most of his message. Anyway, I went up to him afterwards and asked about his equation. We didn't get anywhere in that first discussion, but later we bumped into each other again and he told me was keen to get the equation correct. This was a good sign. Eventually, we exchanged emails and he sent me the relevant portion of his Velocity slides. That led me to write the following July 9th entry in my notebook.
But now the problem became: How do I convey these scribbles to someone who is not a professional performance analyst? Ultimately, I created my own slides. I was planning on just 3 slides to match my notebook entry, but because of all the subleties involved, it turned into a 12-slide-PDF tome! Then, I became concerned that Alistair would drop the whole thing like a rock because it was too hard. To his credit, he didn't. As you can see in my slides, I end up with a new formula:
Delay = Number of requests / (Throughput × number of machines) , | (3) |
which is very close in spirit to Alistair's original formula, but with the important distinction that it now obeys the iron law of performance.
Of course, I prefer my bricks to be mathematical, and the following multiserver (M/M/m) queueing diagram helps me to get there.
Here, 1,2, ,..., m machines are available to service requests arriving into the system from the left. The aggregate arrival rate is mλ. The incoming steam of requests is split equally across the m machines and therefore, each machine produces λ throughput. The total number of requests in the system is Q. I use this symbol deliberately to remind myself that it is quite literally the number of requests enqueued in the system. Here, as in my books, the queue length means the sum of requests in service at the machines, as well as those that are waiting for service. The waiting line is indicated by the blocks in the diagram. The delay in eqn.(3) or more technically, the response time (R), can now be written very simply as:
R = Q / (m λ) , | (4) |
Returning to the Freeway description, it says: more cars (e.g., bumper-to-bumper traffic), longer average delay; higher speed limit (throughput per lane), shorter delay; more lanes (machines), shorter delay. Moreover, this formula is also capable of generating the convex latency curves that Alistair shows in his plots. In fact, it's a variant of Little's law. Finally, I do have another brick that can be neatly cemented into my wall---and Alistair's too. For me, it's one of the great synergistic outcomes of attending Velocity 2009.
I suspect this isn't the last word, either. Eqn.(4) still doesn't associate chargeback, but I think Alistair is trying to say something important and this version of a cloud capacity formula may just be the first step along the way to understanding the full implications for capacity planning and chargeback for cloud computing.
Very nice. The neat thing about this formula is that it has nothing to do with the cloud. If you have a load balancer in front of a bunch of web/app servers, it works the same. The only caveat I see is that I doubt that one can every achieve equal distribution of the requests across the m servers.
ReplyDelete