## Friday, July 18, 2008

### Mr. Amdahl Calls the Repairman

Back in January, I reported that my Universal Scalability Law (USL) had been proven to be equivalent to the synchronous throughput of the load-dependent machine repairman queueing model. Although I wasn't lying, what I really had was what we call a "sketch" for a proof. The details needed to be filled in, but that was just a question of finding the time to do it. In May, I found time to write the first draft and thought I would be finished shortly thereafter. That attempt ground to a halt for two reasons (you might want to refer to Appendix A in my GCaP book for the background):

1. Simulations that supported the theorem involved a barrier synchronizer, and for this to work continuously, my colleague Jim Holtman, discovered that the distribution of parallel execution times (with mean 'think' time Z) could only be deterministic (D). That's a special case of the repairman: D/M/1//N, whereas the equations in my proof show quite clearly that it must work for M/M/1//N.
2. There's a theorem in queueing theory (Bunday and Scraton, 1980) which states that what holds for the repairman (M/M/1//N) also holds for a more general source of requests viz., G/M/1//N. Since G is a superset of the distributions D and M, this theorem suggests that our simulations probably shouldn't be restricted to D-distributed think/execution times as observed in (1). Or should they?

These two stumbling blocks also apply to the simpler case of Amdahl's law (the non-load-dependent repairman in Appendix A) and can be summarized in the following paradox:
The proofs for both USL and Amdahl use standard MVA equations. That means everything is expressed in steady state, and our interest is in an analytic solution for the synchronous bound on throughput. But synchronization and steady state are not compatible concepts in queueing theory because the former is an instantaneous effect, whereas most queue-theoretic analytic solutions only exist in the long-run or steady state condition.

In plain English, think of it this way:
I have a closed queue with one server (i.e., M/M/1//N). Suppose all N requests have the same deterministic Z (think) time. At the end of the first Z period, all N requests will enqueue simultaneously i.e., all machines break down at the same time, in the lingo of the repairman model. But since, by definition, they are serviced serially, they will return to the parallel execution phase (think nodes) separately and thereafter will always return to the repairman at different times. In other words, even if I start with synchronized visits to the repairman, that synchronization is immediately lost after the first tour because it is destroyed by the queueing process.

In fact, this paradox has bugged me ever since I came up with the original proof for Amdahl's law in 2002. But the MVA equations are correct, so I continued to ignore it. Last week, Jim came up with a new way of implementing the synchronization in his simulations, viz., time slicing.

The previous barrier synchronizer acts like a separate waiting room for each serviced request or repaired machine. Only when all N machines have been repaired (serially) and collected in the waiting room, are they permitted to go back into parallel execution mode when the barrier drops. In order for them to continue to require service from the repairman simultaneously, the Z time must be D-distributed; consistent with (1) above. What Jim discovered with the time-sliced synchronizer was this.
Suppose we allow the Z time to be G-distributed. But instead of requiring all N requests to enqueue simultaneously, we allow any number (less than N) to "break down" but with the new condition that when any request invokes service at the repairman, all other, non-broken machines or still executing processors (in the Amdahl interpretation), must stop execution as well. Jim suggests thinking of this form of synchronization as representing a global lock on execution. The upshot is that Z can now be G-distributed, which is consistent the repairman theorem in (2).

Although this may seem to be the same as case (1), it is subtly different. The synchronization is now occurring with much higher frequency and for shorter times, on average, so the effect of the G tails in the Z distribution is lost. At the same time Jim was doing this in his simulations, I also got a new idea. I can think of the synchronization as a 2-state Markov process (parallel and non-parallel) for which I can write down steady-state solutions, and this provides the resolution of the above-mentioned paradox.

Jim and I will talk about these new results in the upcoming GDAT class. I will also post the proofs to the arXiv preprint server, once I clean up the draft.