*not*to display performance data. Remember: collecting and analyzing your performance data is only half the battle. The other, equally difficult, half is presenting your performance data and conclusions.

### Example 1

This first example is an oldie but a baddie. It will also provide some context for the second example below.See the problem?

Choosing a logarithmic scale causes the otherwise linear intervals on the axis to be transformed in a *nonlinear* way. That nonlinearity in the scale causes a distortion in apparent the "shape" of the data. It tends to **expand** the horizontal separation between data points near the origin and **contract** the horizontal separation between data points that are far away from the origin. In the case of Figure 1, logarithmically rescaling the x-axis results in curves that take on an *artificial* sigmoid or 'S' shape. The sigmoid distortion presents the wrong visual cue, which in turn can cause the reader (including yourself) to jump to the wrong conclusion.

That's not to say you should never use a log scale, but there are provisos to consider before doing so.

- Only use it when it serves to illuminate something hidden in the data, e.g., exponentially distributed data will look
*linear*on a plot with a log-scaled x-axis. I used this to great effect for analyzing Oracle query performance - Never use it merely for the convenience of compressing data with a large x-range into the available width of a plot window. Use multiple views.
- Never use it without alerting the reader that the axis is not a conventional linear scale, e.g., label it a
*log(x)*instead of just*x*. - Try to indicate the base of the logarithm, e.g.,
*log2(x)*or*log10(x)*. Don't worry about using illegible subscripts. The idea is to make the label as visible as possible, not as correct as possible. The reader will figure out the base and other details once their attention has been drawn to it.

More detailed discussions can be found in my previous blog posts on this topic:

### Example 2

This example is something I don't recall seeing before and it completely threw me, initially.See the problem?

At first glance, your visual cortex interprets the x-axis as a linear scale because you see the values range between 1% and 100%. The 0% is not shown because the tick-marks occur at the midpoint of each interval.

On closer inspection, however, you can see that there are also 1%, 2%, 5%, 10% marks that are **evenly** spaced when they should be roughly **doubling**. Moreover, the 100% mark is not 10 times longer than the 10% mark. That suggests some kind of log-scaling, which could indeed lead to a sigmoid shape in the data, just like Figure 1. But the right-hand end of the scale clearly has **linear** intervals. So, what kind of scaling is this?

Surprise! It's not any kind of mathematical scale transformation. It's the result of mindlessly plotting *categorical data* in Excel. Although the x-labels look like numerical values, they are not. They are *names*. There's no warning about this effect because it was probably quite unintentional. It would have been easier to spot this false distortion if Excel had labelled the intervals as "1%, "2%", etc.

Exercise for the reader: If the x-axis is straightened out by using numerical values, what shape do the curves take?

I'll have more to say about techniques for presenting performance data in the upcoming Guerrilla Capacity Planning classes.

## 1 comment:

good to know

Post a Comment