## Thursday, August 13, 2009

### A Funny Thing Happened on the Way to the Priority Queue

Suppose two workloads $W_a$ and $W_b$ access a common resource, e.g., the CPU. They each have response times $R_a$ and $R_b$, respectively. The response time $R_a$ is longer than you would like. A common way to try and improve $R_a$ is to give $W_a$ a higher priority at the CPU. That's what the Unix nice command is all about. For example, if $W_a$ were a Unix process then: nice -15 Wa, would give a higher priority than the default already assigned by the Unix scheduler. But how much better will it's response time be? Nice can't tell you that.

In simple situations, like your own Linux laptop or PC, you can experiment with different nice values and observe the effects. In more complex situations, like big servers running many workloads or carefully monitored production systems, it is preferable to estimate the performance impact of adjusting priorities on workloads or processes, rather than just jumping to conclusions. That's where PDQ can be useful.

### O/S Scheduler Example

In Section 3.9.3 of my Perl PDQ book, there is an example PDQ model that compares the improvement in response-time performance for a production (interactive) workload when it is given priority over a development (batch-like) workload. The results are summarized in this table:

 Workload Default Time Prioritized Time Prod 2.69 0.62 Devs 8.19 8.66

where the times shown are in cpu-seconds. The important conclusion is that "Prod" response time is improved by better than 75% with very little degradation to the response time of the "Devs" workload. Very unintuitive and yet another reason why performance models are important.

### How It's Done

Although it's explained in my book, let me very briefly summarize the steps here, because I want to compare the same procedure in two different scenarios.

An O/S scheduling model is represented by a closed queueing circuit in PDQ, because there's a finite number of processes (N) being executed or a finite number of users sharing the system. The procedure for reflecting different priorities in PDQ is based on something called the shadow-server technique, which I first learned from the late Ken Sevcik. It goes like this:

1. Decide which of the workloads will get higher priority, e.g., $W_a$.
2. Determine the CPU utilization of $W_a$ alone on the CPU: $\rho^{cpu}_a$.
3. Take the lower priority work ($W_b$) with its service time $S_b$, and assign it to a virtual or shadow CPU (that doesn't actually exit). In PDQ, this literally means creating a separate PDQ node.
4. Next, we need to parameterize the service time for $W_b$ on the shadow CPU. Normally, it would just be $S_b$, but now we need to express the fact that $W_b$ has been given lower priority and its performance will suffer.
5. The service time $S_b$ will appear to be longer (inflated) because real CPU-cycles are being consumed by $W_a$ at the expense of $W_b$. We capture this effect with an inflation factor for $S_b$: \begin{equation} S^*_b = \dfrac{S_b}{1-\rho^{cpu}_a} \end{equation} If $W_a$ was not present at the CPU, i.e., $\rho^{cpu}_a= 0$, then we would have the uninflated case $S^*_b = S_b$.
6. Use this inflated service time $S^*_b$ as the argument in SetDemand() at the shadow-CPU queueing node in the PDQ model.

The inflation factor for $W_b$ in eqn.(1) comes from the utilization of the CPU by workload Wa (which now has higher priority). The source of the time-penalty for $W_b$ is clear, viz., CPU utilization by $W_a$. Running the PDQ model produces the results in the above Table. This same procedure can be extended hierarchically to more than two workloads. Simple, and to my way of thinking, very intuitive.

That's how I learnt it, that's how I've always done it, and that's how it appears in every other textbook that discusses the shadow-server technique. However, I recently had occasion to consider prioritizing workloads on a web apps-server. The only difference is that there is a potentially infinite source of requests coming in from the Internet so, in the PDQ model I simply replaced CreateClosed with the CreateOpen function. No problem. Well ... not quite.

Here are the results from my PDQ-R models. The left-hand plot shows the impact of changing priorities in the O/S scenario described earlier. Blue is "before," pink is "after" priority has been given to the Prod workload. Prod times are reduced dramatically without increasing the Devs response time by too much.  But now look at the situation for the web-apps server. The response time values are numerically different but that's not important. Giving priority to Prod (Wa) shows a win, as expected; about 50% decrease in response time, but the Devs workload ($W_b$) response time remains unchanged!

That's weird because the "cycles" come from the same CPU, so the lower priority workload must be the victim and that should be reflected in a longer response time (due to the inflation by $W_a$ work). You may be thinking that there is a change but we just can't see it with this set of parameters. That's what I also thought, initially, so I tried various other numerical values, but it made no difference.

The response time for $W_b$ on the shadow-CPU is the same as when $W_b$ is running concurrently with $W_a$ on the shared physical CPU.

WTF! It's as though the effect of transforming the low priority work from the shared physical-CPU to its own shadow-CPU, somehow got cancelled out. This unexpected and paradoxical result was so shocking, it caused me to drop everything and look into it more deeply. (Goggle up. Math ahead.)

I won't bore you with the details, but it turns out that this is a fundamental truth. The reason is that the response time for $W_b$ on the shadow-CPU is given by: \begin{equation} R^*_b = \dfrac{S^*_b}{1-\lambda_b S^*_b} \end{equation} where $S^*_b$ is the inflated service time defined in eqn.(1). If we just substitute that definition for $S^*_b$ into eqn.(2), it produces: \begin{equation} R^*_b = \dfrac{S_b}{1- \rho_a - \lambda_b S_b} \end{equation} and applying Little law (in the form, $\rho = \lambda S$) to eqn.(3), we get: \begin{equation} R^*_b = \dfrac{S_b}{1- (\rho_a + \rho_b)} \end{equation} Notice that there are no starred variables on the right-hand-side of eqn.(4) now. Moreover, the expression on the right is precisely the response time (or residence time) seen by workload $W_b$ when running on the physical server concurrently with workload $W_a$ prior to priority shifting. The denominator contains the total utilization due to both workloads. Therefore, eqn.(4) simplifies to: \begin{equation} R^*_b = R_b \end{equation} For the low priority work ($W_b$), the virtualized response time is the same as the shared response time. Wow!
The response time, $R_b$, is invariant under the shadow-server transformation.