Comments on The Pith of Performance: This is Your Measurements on Models

Hi again MariuszW, I actually started putting tog...

2010-04-29T13:29:31.732-07:00

Hi again MariuszW,

I actually started putting together a reply to your question on Fri, 16 Apr 2010, but ran into 2 problems:

1. I had to travel out of town that weekend/week, but I did manage to get a response from one of the authors of the BRLCAD benchmark that you asked about.

2. To answer your question(s) properly requires more space than these little Q&A boxes provide.

Therefore, I'm putting together a separate blog post, so please bear with me.

--njg

Hi, So question (or two) again :) I understand tha...

2010-04-15T02:40:28.670-07:00

Hi,
So question (or two) again :)
I understand that system characteristic is in alpha and beta and the interpretation is the key. And that there is aggregation.

In GCaP (in table 5.1) there are ray tracing benchmark results - what is workload (users, tasks) described in such case? Numbers? Is it Xmax on given processor in this table? - so processor p1 is loaded to measure Imax, next p4 is loaded to measuer Imax? - isn't task or user number important for given p-Imax?

I want to use USL - C(p) version - but in my case p is web server numbers (Tomcat) hosting web application (java) run on three physical machines - I want to estimate scalability - how many new Tomcat (or physical machines) is required with growing workload (new users using web services on Tomcats) - there are three Tomcats on one physical machine and three physical machines now, so 3x3=9 Tomcats.
I want to construct C(p) chart gathering benchmark data using Jmeter as load generator.
Does treatment p as web server number is good idea? Does X for gven p is Xmax for that p (so different users number in load generator has to be used to find that Imax)?

Hi MW, I understand your confusion because the st...

2010-04-02T07:56:28.344-07:00

Hi MW,

I understand your confusion because the statements in my GCaP book are a bit more ambiguous than I intended. What I was trying to convey is this.

The power of the USL comes from its simplicity, e.g., you don't need any queueing models or simulations. Moreover, there is nothing in the formulation of the USL that restricts its utility to any specific systems, e.g., Linux, Intel, multi-cores or multi-tiers. It can model them all. It is truly universal.

On the other hand, everything involves trade-offs.
One such trade-off in the USL is that the workload is treated as an aggregation of whatever is actually executing on the system. As you describe, there will be many components of hardware and software in the real system. Nonetheless, the X(N) used as INPUT into the USL and the X(N) predicted as an OUTPUT of the USL, is the aggregate throughput.

There can be no resolution into individual workload components, as one might be able to do in PDQ. It's in that sense that the USL views the system as homogeneous. There is no way to explicitly express any heterogeneity, even though it exists in the real system.

How can the USL be USefuL if workload heterogeneity is not captured? The answer lies in my use of the word "explicit." I said it is not represented explicitly; I didn't say it's not represented as all. So, you may well ask, where is it? It's encoded in the 2 coefficients: α and β. This encoding is one-way and that's what makes the USL universally applicable but also limited in what information it can reveal predictively.

Another way to think of this limitation is the following. Suppose you apply the USL and it reveals a larger than desirable value of the β coefficient and therefore the throughput is predicted to roll off at some load value. You cannot then turn around and ask the USL model to tell you which of the multifarious workload components is responsible for that roll off. However, as described in my 2008 CMG "Zones" paper, the USL can certainly direct your attention at the right tuning effects to improve scalability. See http://arxiv.org/abs/0809.2541

Thanks for your question.

Hi, Can USL be applied in all situations? In your ...

2010-04-02T00:32:18.690-07:00

Hi,
Can USL be applied in all situations? In your GCAP (6.7.4 Why It Works) you write about "homogeneity". I write currently JMeter tests to measure scalability fo mutltitier application - load balance + weblogics + database - and homogeneity is still im my head:). Application has web services that will be loaded. That web services handle different functionality. So, can I use USL or rather PDQ (PPDQ) and queue model?

Thanks for your books and articles.

Regards,
MW

Andy sent me some algebra via email, and now that ...

2009-09-18T11:14:11.328-07:00

Andy sent me some algebra via email, and now that I understand his question better, I will abbreviate it here.

We know from my previous post of AUGUST 22, 2009 http://perfdynamics.blogspot.com/2009/08/bandwidth-and-latency-are-related-
like.html that, as a general matter, response time R(N) and throughput X(N) are inversely related. The relative throughput C(N) is given by the
USL model on the RHS of eqn.(1) above. By definition, the denominator of C(N) is a polynomial of degree 2 in N. Inverting C(N), to get the
corresponding R, will produce a quadratic form in N. A quadratic function is a curve, a parabola, and not the linear "hockey stick" handle that we expect from standard queueing theory (for a closed system). Andy's question is, "What gives!?"

First off, I want to acknowledge that this is an excellent question which, as far as I can recall, no one has asked before. Curiously perhaps,
although I do discuss this in my GCaP classes http://www.perfdynamics.com/Classes/schedule.html (which is why you should attend them), I haven't written it down anywhere, that I know of.

Andy is quite correct ... and so is the USL. Here's what it means. Let's assume for the moment that the β coefficient is zero (i.e., no coherency delays). Then, the corresponding response time would be linear, as expected. Above saturation, the throughput X(N) would plateau at a constant value and R(N) would climb linearly with the increasing user-load N.

When β > 0, however, the throughput becomes retrograde (not seen in standard queueing models). Since it takes longer to complete each request, the response time R(N) must increase faster than linear! The quadratic form of R accounts for this "super linear"
behavior by bending the expected hockey stick upward.

So, all is well with the USL. :)

Very difficult for me to comment without a specifi...

2009-09-17T11:19:34.235-07:00

Very difficult for me to comment without a specific example. Can you send me some sample data?
http://www.perfdynamics.com/contact.html

Hi Neil, I've just finished your GCaP book; v...

2009-09-17T09:16:17.973-07:00

Hi Neil,

I've just finished your GCaP book; very thought-provoking. Thanks!

I've applied the procedure you lay out in chap 5 (using R) to some of my own application test measurements, and have a question.

My test measurements include response-time as well as throughput. The measured response-time vs N conforms to the expected "hockey stick". This makes me happy and gives me some confidence that my measurements are not completely wrong.

I perform the USL computations described in chap 5.

Using USL-modeled values to compute response-time (as in GCaP Sec 6.7.3), the resulting modeled-response-time vs N does not have a hockey-stick shape, and in some cases doesn't closely match the measured response-time values.

In general, the modeled-response-time vs N either looks like a straight line (if coherency coef = 0) or parabolic (if coherency coef > 0).

Can you shed some light on how/why measured response time curves are (or should be) like a hockey-stick, while the USL-modeled curves are not?

Could this be a problem with my calculations?

Thank you very much!