The Pith of Performance: Bandwidth and Latency are Related Like Overclocked Chocolates

Prior to the appearance of special relativity theory (SRT) in 1905, physicists were under the impression that space and time are completely independent aspects of reality described by Newton's equations of motion. Einstein's great insight, that led to SRT, was that space and time are intimately related through the properties of light.

Space and time are related

Instead of objects simply being located at some arbitrary position x at some arbitrary time t, everything moves on a world-line given by the space-time pair (x, ct), where c is the universal speed of light. Notice that x has the engineering dimensions of length and so does the new variable ct: a speed multiplied by time. In Einstein's picture, everything is a length; there is no separate time metric. Time is now part of what has become known as space-time—because nobody came up with a better word.

In SRT, all motion is related in a nonlinear way to the speed of light. As you move at speeds closer and closer to the speed of light, the world appears to curve inward more and more.

Take this video trip down the relativistic highway to see what I mean (Try to ignore the awful computer-generated female voice):

Weird isn't it? Yet another example of nonlinear hyperbolae in action. Weird though it be, that's the way it is, and Einstein's unified 4-dimensional space-time picture solved a number of outstanding experimental problems in physics; in particular the elimination of the aether hypothesis.

As the video animation points out, we don't notice any of these bizarre nonlinear visual effects on a daily basis. Space and time appear to us to be Newtonian and separate, but that's only an approximate truth due the special circumstance that we are only capable of moving at sub-luminal speeds. The correct general truth is that space and time are intimately related. So it is with bandwidth and latency.

Bandwidth vs. latency

If you google the web, attend performance conferences, read various blogs or textbooks, you'll soon come across performance pundits trying to tell you that latency and throughput are completely unrelated. Here's a perfect example taken from an Internet newsgroup:

From xxxx@cisco.com Mon, 16 Sep 2002 08:04:04 +1000 At 06:21 PM 14/09/2002 +0530, yyyy wrote: >which one is true ?? > >1) Latency increase ---> throughput(bandwidth usage) increase >2) Latency decrease ---> throughput(bandwidth usage) increase it depends on the context. in reality, Latency & Throughput are two completely different metrics associated with "performance". Latency isn't related to Throughput and vice-versa. ...

Interestingly, this quote comes from a network engineer at Cisco Systems and, as you'll see in the last section, the operation of packet-network systems has biased his view of throughput and latency metrics. But before I explain why his statement "latency isn't related to throughput," is wrong—in the same way it's wrong to say space and time are unrelated—I want to give an example that supports his statement.

Let's start with latency (the fancy word for delay). Think of transmitting commands to one of the Voyager spacecraft; now roughly half a light-day from earth. Both spacecraft were launched in 1977 and are still operating (a breathtaking feat of American technology). So, it currently takes roughly 12 hours to send a command to either spacecraft and an equal amount of time for it to return an ACK to the NASA deep-space network (DSN). A round-trip time of one light-day, for a confirmed transmission, is not conducive to sending a lot of commands and data (and don't forget retries). As Einstein told us, and all measurements have since confirmed, the speed of light c is a universal constant, so there is nothing we can do to shorten that transmission latency.

A DSN channel has a fixed number of bits/channel, which sets the bandwidth of the information pipe to a Voyager. If we could make that pipe wider, we could send more commands and data per round-trip. Indeed, it would be feasible to fatten the pipe at this end but unfortunately, the width of the pipe on the Voyager spacecraft was fixed more than 30 years ago. In any event, even if we could fatten the pipe and increase the data throughput (bandwidth utilization), it wouldn't shorten the transmission time—the latency—because bandwidth and latency are independent performance metrics. Right?

How X and R are related

In the following, I'll use the term throughput in place of the word bandwidth and denote it by the letter X. Bandwidth is really maximum possible throughput, X_max (see the next section). Similarly, I'll denote latency by R because in performance analysis, that metric is more commonly called response time or residence time (time spent in the system).

Load test server

Many of you do load testing as part of performance engineering, so let's consider throughput and latency as measured on a server under test (SUT). Each load level N consists of a finite number of requests in the system. The latency is given by:
\begin{equation} R(N) = \dfrac{N}{X(N)} \end{equation} For simplicity, I'm assuming that the think time, Z, is set to zero (as it often is in stress tests), and therefore we are considering the latency at the server, not the complete round-trip time or response time. The relationship between the latency R(N) and the throughput X(N) in eqn.(1) is shown visually in the following plot.

For the load test SUT, both the throughput data (black dotted curve) and latency data (blue dotted curve) are nonlinear functions of the load N—the independent variable representing the number of virtual users or client-side processes. All load-test throughput curves have this general concave shape and all latency curves have the complementary convex shape.

Guerrilla mantra 1.11: Capacity planning is about setting expectations. Even wrong expectations are better than no expectations!

These curves are what you should always expect to see. If certain data do not conform to these shapes, it doesn't necessarily mean those data are wrong, but someone has a lot of explaining to do.

The vertical line in the plot denotes a kind of optimal load point N_opt. To the left of it, resources are generally being under-utilized. To the right, resources are generally being over-utilized because the server has become saturated and acts a bottleneck. This optimal point defines two load regimes:

Light-load regime: When N < N_opt, resources are generally being under-utilized and throughput scales linearly: X(N) ∼ N (where "∼" stands for goes like). This simply says that the throughput scales approximately linearly with load. It closely follows the dashed line labeled X_lin. Substituting this approximation into eqn.(1), we see that the corresponding latency will behave as:
\begin{equation} R(N) \sim \dfrac{N}{N} = const. \end{equation} In other words, the latency data will remain approximately constant, following the line R_min, as the load increases in the light-load regime. This is one way the throughput and latency can appear to be independent, even though they're not. If follows from the fact that at light load, almost all N requests get serviced without needing to wait. Therefore, no queue forms and the latency remains close to the constant average time it takes to service each request.
Heavy-load regime: When N > N_opt, resources are generally being over-utilized and the system becomes bottlenecked. Driving the system beyond N_opt with more client load cannot increase the throughput, hence X(N) ∼ const. The throughput levels off at X_max under increasing load (the bandwidth limit). Throughput can also degrade, but that's another story. Substituting this approximation into eqn.(1), we see that the corresponding latency will behave as:
\begin{equation} R(N) \sim \dfrac{N}{const.} = N \end{equation} In the heavy-load regime, the latency data will grow approximately linearly with N, and follow the line R_lin in the plot, as the load increases.

This example shows quite generally that throughput and latency are not only related, they are inversely related by eqn.(1). Since the throughput X(N) is a nonlinear function of N, so is the latency R(N). Only at the extreme south-west and north-east corners of the plot, do X and R appear to be approximately independent metrics.

Internet server

What happens if the number of requests in the system is allowed to become unbounded? This is the kind of thing one might see on a server connected to the Internet. Curiously, this situation is a bit less intuitive to understand but, throughput and latency are inversely related in this case also. The equivalent of eqn.(1) is:
\begin{equation} R(X) = \dfrac{N(X)}{X} \end{equation}

Now, X is the independent variable representing a particular load in terms of throughput or arrival rate (e.g., HTTP Gets/second). The corresponding throughput-latency diagram look like this:

As the traffic rate approaches the rate at which the server can complete work, the server saturates and serious queueing of requests ensues. As a consequence, the number of requests in the system N(X) also climbs very dramatically on the right-hand side of the plot. The latency function R(X) is nonlinear by virtue of the behavior of N(X), not X. Whereas in the Load Test Server case, the number of requests N in the system was independent of X, here it is dependent on X and dependent in a nonlinear way. Once again, only at the extreme south-west and north-east corners of this plot do X and R appear to be approximately independent metrics. It's this nonlinear behavior that makes performance analysis unintuitive; even for performance experts. And that nonlinearity arises from the way requests become enqueued in the system. BTW, both eqns.(1) and (3) are essentially restatements of Little's law—an iron law of performance analysis.

How X and R can seem unrelated

Finally, let's return to the statement by the Cisco engineer and the example of transmitting commands to the Voyager deep-spacecraft. Throughput and latency certainly did seem to be independent in that case. However, that apparent independence comes from a very special set of circumstances. The request load, i.e., packets on a bus or network, are clocked onto the transmission medium. In queueing theory parlance, we say those systems behave like a queue with deterministic inputs (arrivals) and outputs (completions).

A simple example of a deterministic queue is provided by a converyor belt. Think of manufacturing assembly line with objects equally spaced on a conveyor belt, e.g., a shrink-wrap machine putting plastic on boxes of software. The time to apply the shrink-wrap plastic is exactly the same for each box (deterministic). Moreover, since the spacing between unwrapped boxes approaching the shrink-wrapper is the same distance, there is no possibility of boxes piling up and creating a queue. Since no time is spent queueing, the latency profile looks "flat" like this.

Latency and throughput behave as though they are always independent, as long as the objects or packets remain clocked correctly. Notice that under these special circumstances, both the throughput X and the latency R look like straight-line segments. In fact, they look exactly like the dashed lines in the south-west corner of the throughput-latency plot in the earlier section on Load Test Server characteristics. It's only in this special limit that X and R appear to be unrelated.

Claiming they are always independent is wrong. Internet packet transmissions are not always deterministic, so it is possible to have packet queueing at routers and that's why routers have buffers. To the degree that is true, however, merely takes us back to the Internet Server example already discussed in the previous section.

Overclocked chocolates

The following video shows the intrepid performance analysts (and occasional TV actors), Lucy Ricardo and Ethel Mertz, as they discover what happens when the conveyor belt at the chocolate factory becomes overclocked. Take careful note of what gets improvised as (hidden) buffers when Lucy and Ethel begin to saturate and chocolates start to queue up. The delivery latency on most of those chocolates suddenly becomes rather longer than usual (the vertical blue line segment in the previous plot).

Like 4-dimensional space-time, the most general relationship between throughput and latency is an inverse relationship and therefore, a nonlinear relationship. Under certain approximations, like low loads or constraints, like clocked media, throughput and latency can appear to be unrelated. Of course, if you only ever work with systems that operate in these specialized regimes, you can be forgiven for being "Newtonian" and thinking that throughput and latency are completely independent performance metrics.

2 comments:

Mr Icyhot said...: hi Neil I read this topic im lil bit confused .it would be really helpful if you could illustrate this correlation with a simple example; Tuesday, September 1, 2009 at 11:35:00 PM PDT
Neil Gunther said...: Perhaps some numbers will help.

Let's consider the 2nd and 3rd graphs/plots. In both cases, the throughput (X) is increasing linearly, i.e., it's a straight line. If the relationship b/w X and R were linear, then as I step up the throughput, e.g., X=2,4,6,8, ..., I would expect to see R increase linearly also, e.g., R=2,4,6,8, ... or some multiple of X.

If you look at the blue curve for the latency R in the 2nd plot, it should be clear that it does NOT increase linearly. It's a curve, not a line. Therefore, the relationship b/w X and R is nonlinear. This nonlinear relationship is the most general case (Little's law).

In the 3rd plot, as I increase the throughput: X=2,4,6,8, ..., in the same way as the 2nd plot, the R value remains unchanged, e.g., R=0.1 or whatever. Since R is independent of the X value that I choose, X and R do not appear related in this special case (deterministic queue).; Wednesday, September 2, 2009 at 6:51:00 AM PDT