Wednesday, June 13, 2007

Linux CFS: Completely Fair or Completely Fogged?

A while ago, I saw what looked like an interesting blog entry announcing a new task scheduler for Linux called CFS: Completely Fair Scheduler. I wanted to compare CFS with Fair Share scheduling (FSS); something that took off in the 1990's for UNIX operating systems and something I've looked into from a performance perspective, especially because FSS provides the fundamenal resource allocation mechanism for most VMM hypervisors.

However, I was truly alarmed to find that the blog item took the form of an Oprah Winfrey-style interview. OK, I could stomach that from an informal blog if a slightly more formal, high-level, engineering specification was cited. I don't care about the bits and bytes; I trust the programmer to try and get that correct. I want to understand what is the overall design goal, so I can compare and contrast it with other schedulers. This is "open source development", right? Where is the open design specification? I want to find out about things like:

  • What is the principle of operation?
  • Where is the definition of the CFS algorithm?
  • What is the definition of "fair"?
  • CFS is compared to the "staircase" scheduler. What's that?
  • How can I assess any performance tests if I don't know what is being tested?
  • There are a gazillion links on the topic of schedulers.
  • But they all seem to point back to like so many chanting monkeys!
  • One link even got confused between CPU vs. I/O scheduling (FQS). I can see why.
  • There is no mention of conventional FSS schedulers

Is this how Linux kernel dev is done in general? If this is open development, then I want to be able to read some kind of open design specification. It doesn't have to be terribly formal or bureaucratic in format (in fact, better than it not be), but I cannot understand a software engineer's intent by reading any amount of their code; no matter how good a code-slinger he or she is. Conversely, if I can read the general design specification, then I can think about designing tests to see if it meets those intended goals.

Isn't it slightly ironic that proprietary (closed) O/S development is usually carried out against a written design specification, whereas public-domain (open) O/S development is carried out against ... erm ... blog interviews!?

For those who are not familiar with the FSS concept, UNIX, and Linux, were originally designed using time-share schedulers (TSS), so as to maintain acceptible interactive response times among multiple users on the same physical computer system. The TSS principle of operation can be stated thusly:

The goal of TSS is to give every user the illusion that they are the only active user on the system.

Everything else is details. By comparison, the FSS principle of operation can be stated as:

The goal of FSS is to give every user (or group of users or applications) the illusion that they have their own set of computational resources whose performance is scaled according to the allocated resource entitlement.

The concept of a FSS was discussed in the 1984 issue of the Bell Labs (home of UNIX) Technical Journal. The first UNIX implementation was presented in a 1988 paper from Sydney University by Kay and Lauder. This work was later spun off into an Australian company called Softway, which licensed SHARE-II. This FSS code found its way into Solaris, Irix, and Unicos, to name a few.

As far as I can tell, there is no FSS implementation in Linux and none planned. This may not be helpful when it comes to getting Linux more firmly ensconced in large-scale, commercial, data centers where resource allocation is a very serious issue. And while Linux developers are busy chewing the fat over this and that scheduler mods, be aware that the future is already here. Mainframes have had FSS schedulers in operation for more than 20 years. What they have now is a capability called workload management (WLM) goal mode. That's where you (the sysadmin) tell the O/S (in this case z/OS) what you want it to achieve in the way of performance goals. For example, you can specify how fast you want it to complete batch work (something called "velocity") or what response time targets certain applications should meet. In other words, you provide these target numbers as inputs to WLM, and it reports back to you as outputs how well it is achieving those goals. Linux has an opportunity to take the lead in this area; especially under the auspices of IBM. Or is it just me that's in a fog?


Unknown said...

The spec is:

Of course no scheduler will ever please everyone. Schedulers are not psychic; they don't know you want max thruput for
job A and min latency for job B unless
you tell them.

Neil Gunther said...

Hmm ... I suppose it's better than nothing, but I remain underwhelmed. This reminds me of searching for extraterrestrial life. Even if it IS our there, it's damn difficult to find. And now that I have found it (or it found me, in this case), it looks like their IQ is LOWER than ours!

I don't see how the comment about "psychic" schedulers follows from anything I said.

John Allspaw said...

It appears that you're confusing the blog article with actual open OS development, and expecting a quick news item to have all of the details.

Your comment about the spec posted by Paul above sounds a little bit dramatic, if not inflammatory.

The beauty of open development is transparency. You've posted questions clearly aimed at the CFS developers here on your blog. Instead of asking the blogosphere, why not post those questions about CFS to the lkml, to the developer directly ? And your questions about the content of the blog article, why not post those to the article itself ?

I suspect you'd probably get better answers to both sets of questions. :)