Think about observing a real checkout for just 30 seconds. In that time, the current customer might well complete service (i.e., finish having their groceries rung up) and depart for the car park. But, during that same period, 3 new customers might have arrived at the checkout and joined the other waiting customers. That's not good, because the number of arrivals is greater than the number of departures or the number coming in does not equal the number going out. Therefore, it looks like a transient condition and transient systems are more difficult to solve than those in steady state. Moreover, reliable statistics only make sense in steady state. Performance engineers are faced with the same problem of detecting steady state throughput when load testing an application. One way to address the problem is simply to look at the system for a longer time. If the shorter term observations are close to the longer term observations, then the system must be close to steady state. At the grocery store, you might need to observe the checkout of 30 minutes, instead of 30 seconds, to see the same number customers departing as arrived during the observation period. In PDQ, however, you don't have to observe steady state because the solutions can only be produced by assuming the system is already in steady state.
One side-effect of solving queueing systems in steady state is that every customer has to visit every service facility, whether or not they actually do so in the real world. This follows from the fact that in steady state, we don't know where each customer is at any instant because we can't resolve instants when everything is averaged out over the long run. In other words, in a PDQ model of a grocery store, every customer visits every checkout aisle! That's very different from the way things are set up in an event-based simulator or LoadRunner. In PDQ, and similar modeling tools, the solutions are derived by assuming that every customer is sequenced through every queue. Let me clarify: there is no parallelism in PDQ. You may well ask, "If there's no parallelism in PDQ, how can I make a performance model of things like a disk array or Hadoop?" That's a very good question and one that had better have an answer because disk arrays and Hadoop utilize parallelism by their very definition. The answer is, there's a trick to doing it in PDQ.
The trick is to recognize that 2 parallel disks, for example, servicing IO requests (incoming at rate λ) that are split 50/50 between the disks (λ/2), is identical to servicing all the IO requests (λ) using 2 double-speed disks (S/2) in sequence. Diagrammatically, it looks like this:
where S is the average disk service time (including seek, xfer, etc. times) and S/2 means the disk services IOs in half the time or runs twice as fast. This trick (it's really a theorem) generalizes to an m-way JBOD or array.
Let's take a simple example of a 7200 RPM SATA disk servicing IOs arriving at an average rate of λ = 50 IOPS. According to this blog, the average service time for a single random IO is S = 14.1 ms. We can usefully compare 3 cases:
- Singe disk with buffering, using the simple M/M/1 formula:
R1 = S
1 − λ S
gives an average IO latency of 0.047619 s or 47.6 ms. The waiting time in the buffer accounts for the 3× increase in latency relative to the single IO case.
- Twin parallel disks using the simple M/M/1 formula (left side of the above diagram):
R|| = S
1 − (λ/2) S
The 50/50 split in the arriving IO requests shows up in the denominator of eqn. (2), which predicts an average latency of 0.0217391 s or 21.7 ms. That is closer to the best case time of 14.1 ms for a single random IO, and demonstrates that parallelism is a good thing.
- Sequential double-speed disks (right side of the above diagram):
R2 = S/2
1 − λ (S/2)
1 − λ (S/2)
This time there is no split in the arrival stream but the factor 2 now shows up in both the numerator and the denominator because it is associated with S. Eqn. (3) produces a response time identical to that obtained from eqn. (2), thereby corroborating the theorem.
It's eqn. (3) that tells us how to implement parallel disks in Perl PDQ, for example. Create two PDQ nodes, each with half the service time (twice the speed) of a single disk:
$m = pdq::CreateNode("disk1", $pdq::CEN, $pdq::FCFS);
pdq::SetDemand("disk1", "IOrequests", 0.0141/2);
$m = pdq::CreateNode("disk2", $pdq::CEN, $pdq::FCFS);
pdq::SetDemand("disk2", "IOrequests", 0.0141/2);
If there are m-disks in parallel, just write a loop.
What I like about this trick is that it provides an unusual insight into how parallelism works. It tells you that the reason parallel disks are a good thing, is because m-disks effectively run m-times faster. The PDQ model cluster.pl, in Chap. 7 of my Perl::PDQ book, is a generalization of this twin-disk example that utilizes up to 240 disks.
The above trick, also tells you how you can beat parallel disks. Instead of going through all m-sequential fast stages, just go through the first one. That turns out to be a version of Amdahl's law, which I'll discuss in my upcoming Guerrilla Capacity Planning class.