Here are two good examples of bad plots that I came across recently.
The first plot shows the results of benchmarking three HTTP servers: Tornado, Twisted Web, and Tornado on Twisted on a "Hello World app"(?) workload.
Irrespective of the merits of the benchmark, notice first that this is a plot of throughput because the y-axis has the units of a rate, viz., requests per second. Moreover, the y-axis scale uses linear intervals. As I discussed in a previous blog posting, all throughput curves should have a concave shape. These curves do not. Why not? You need to stop and ask questions like this every time your expectations are not met. Without knowing any details, I can speculate that the test-rig was already driven into saturation, starting with the first concurrent request! Therefore, the first data points provide all the comparison information. The other measurements are redundant (log axis or no). So, what's the point of the plot? You'd have to ask the author. Anyway, perhaps this result is hardly surprising given a light-weight CPU-intensive workload like "hello world" as the benchmark.
More importantly for this discussion, notice the x-axis is not a linear scale. It uses logarithmic intervals! Why? Once again, you'd have to ask the author. In this particular case, no data were harmed because they are all constant (horizontal lines). No distortions are introduced, but it's still bad practice. More on why, in a minute. Meanwhile, the plot thickens ...
This plot purports to compare the throughput scalability of an OLTP benchmark on a proposed BFS ("Brain Fuck Scheduler"—great marketing choice!?) scheduler and the current CFS (Completely Fair Scheduler) scheduler in Linux.
Without getting into the Linux-scheduler wars, notice that like the previous plot, the y-axis uses linear intervals, while the x-axis uses logarithmic intervals. Since these data contain more information than the HTTP benchmark plots, the log scale now introduces some real distortions. For example, there is a maximum near 20 client tasks. The roll-off in throughput appears dramatic beyond that point, but in fact, it occurs gradually as another 130 clients are added. BTW, notice how hard it is to add on a log scale. There's also an artificial kink introduced between 1 and 2 clients. With linear intervals on both axes, the plot should look something like this USL plot.
Using log-intervals is usually bad visualization practice under any circumstances, but especially for performance analysis. I don't mean that from a Tufte-type aesthetic standpoint, I mean it from a cognitive amplifier standpoint. Log scales give the wrong visual cues. Plotting nonlinear data on nonlinear axes tends to create misinformation in the sense of distorting perception. Indeed, such odious motivations explain why marketing glossies often use logarithmic intervals. Logarithms damp out the variance of magnitudes, thereby visually diminishing actual competitive differences. In fact, the SPEC CPU benchmarks are guilty of this sin by virtue of using the geometric mean. But don't get me started ...
In general, performance data are already nonlinear. You don't need to further confound that nonlinearity with nonlinear axes. Most people are too busy looking at the "message" on the plot to properly absorb any nonlinear distortion of the axes, especially when they have nothing to compare it with. That's not to say they can't, but it requires more active mental effort. I know, because I can do it in my head, but I don't like doing it. It's more cognitive effort and I could also get the re-scaling wrong in my head. In the meantime, you have completely lost me as an audience member, because I'm now distracted by your log plot and lost the plot of your presentation. Audiences are generally passive. Don't make them do unnecessary work.
So, when is it appropriate to use a log scale?
- Almost never
- Data ranges that cover more than 4 decades (the above plots don't qualify)
- Data that involves power-law behavior (the above plots don't qualify)
- Vertical detail (linear intervals) is far more important than horizontal detail (log intervals)
If you still feel compelled to use log intervals, then create both log and linear plots for visual comparison. At the very least, think twice! (i.e., 21)
Dear Neil,
ReplyDeleteI'd like to thank you for your feedback, however harsh it may be.
Having a decent mathematical background I confess that I regret being the protagonist of such a blog post. But I'd like to explain how that came to be.
Simply stated, the chart you reference was the result of "laziness" rather than ignorance. The chart was generated by Google Docs, which automatically rescaled the x-axis and, as far as I know, doesn't provide an option not to.
As soon as I saw the logarithmic scale I cringed a little, but given the unusual shape of the curve, I figured that very little distortion would be introduced.
Publishing a chart on the blog was easier than publishing the corresponding tabular data, so I opted for an admittedly uninteresting chart (most of the data points weren't interesting to start with, as you pointed out).
I also admit that a "hello world" program is not representative of any real-world workload. But I was mostly interested in determining if there were performance differences at the I/O loop level.
At any rate, thanks again for your feedback.
Antonio