Comments on The Pith of Performance: Queues, Schedulers and the Multicore Wall

It's very hard to make any further progress wh...

2012-01-22T12:22:30.541-08:00

It's very hard to make any further progress when you haven't stated what perf question you are trying to address.

Quoting: "at most 200 concurrent connections" suggests to me a finite N=200 or M/M/m/N/N queue, which isn't Erlang-anything.

A lossy Erlang-B queue suggests you might have a batch system in mind (viz., as if additional batch requests would be dropped). However, in steady state equilibrium, you expect to see some waiting time, even in a batch system, because the observation period T >> S mean service period. To avoid any waiting time at all in steady state, would require infinite servers (delay node).

Hello Neil, thanks for your quick answer. OK, I un...

2012-01-14T14:33:59.526-08:00

Hello Neil, thanks for your quick answer.
OK, I understand perfectly what you mean, so I'll go into further details.
There is actually no waiting (no queue), so if all servers are busy, further jobs are lost (i.e, Erlang-B).
Now, suppose that each server has a maximum of 10 connections, e.g., we can serve at most 200 concurrent connections with our 10 dual core servers.
Would Erlang-B be the best choice (either M/M/10/10 or M/M/20/20)? A truncated Erlang-C (M/M/n/K/FIFO) seems inaccurate to me, as the servers work in processor sharing.

michele: This is the kind of question everyone who...

2012-01-14T09:44:52.152-08:00

michele: This is the kind of question everyone who tries to model any computer system asks; even me. :)

The first unhelpful thing to get out of your head is the word "correct." There is no correct model.
All models are wrong, but some are wronger than others. It all about finding the best approximation.

The real test is, how well any choice of model matches the data or whatever other constraints you are trying to meet. Within those boundary conditions, the simplest model that does the job is usually best choice. So, depending on your data or other goals, M/M/2 might be perfectly fine for a dual-core model.

For the cluster, you need to think about where requests wait if they can't get serviced (all cores busy); assuming they're not just dropped on the floor. In other words: Is there one or more waiting lines? M/M/20, for example, can only have a single waiting line, by definition. Now ask yourself: What does that single waiting line or buffer correspond to in the real cluster?

It might be the run-queue of the O/S or a load-balancer or ... it might not be there at all. In which case it can't be M/M/20. The important point here is, the queueing models are already forcing you to understand more clearly how the cluster operates from a performance standpoint.

For a broader view and more exampls along these lines, take a look at Chap. 7 of my Guerrilla CaP book. There, I model a dual core system HP ML530 server, not as M/M/4 but M/M/4/16. Reason: the system only has a finite number of active threads (16) in the test rig and that has more significant ramifications for performance than the number of servers.

All of this (and more) is discussd in great detail in my Guerrilla training classes.

Very interesting post. I have a question. Would it...

2012-01-14T04:41:05.118-08:00

Very interesting post.
I have a question. Would it be correct to model a dual core machine as a multiserver queueing system with 2 servers?
And what about a cluster of multi-core servers?, e.g., should 10 dual-core nodes be modeled as (a) an M/M/20 system, or rather as (b) an M/M/10 queue whose service rate is twice that of option (a)?

Very interesting post. I have a question. Would it...

2012-01-14T04:39:24.216-08:00

"If a resource is more than 75% busy for sustained...

2009-05-06T08:22:00.000-07:00

"If a resource is more than 75% busy for sustained periods, take a closer look to see whether or not that level of utilization is acceptable in terms of response time or queue length."

I couldn't agree more, and agree with your pointing out that ROT's can be notoriously misleading without the standard caveat of "your mileage may vary".

I might add further that response times aren't very valuable out of context of the entire system. Most growing web architectures have multiple tiers, each with their own load characteristics, latency profiles, and failure modes. So you can't really get the entire picture of your specific response time-to-resource-usage pain points without that inter-system context. It's coincidental that I wrote about that topic this morning:

http://www.kitchensoap.com/2009/05/06/mechanical-analogies-to-web-stuff-part-2/

p.s. I'm a fan of your book and blog. :)