Sunday, May 9, 2010

Mapping Virtual Users to Real Users

In performance engineering scenarios that use commercial load testing tools, e.g., LoadRunner, the question often arises: How many virtual users (vusers) should be exercised in order to simulate by some expected number of real users? This is important, more often than not, because the requirement might be to simulate thousands or even tens of thousands of real users, but the stiff licensing fees associated with each vuser (above some small default number) makes that cost-prohibitive. As I intend to demonstrate here, we can apply Little's law to map vusers to real users.

A commonly used practical approach to ameliorate this circumstance is to run the load test scenarios with zero think time (i.e., Z = 0) in the client scripts on the driver (DVR) side of the test rig. This choice effectively increases the number of active transactions running on the system under test (SUT), which might include apps servers and database servers. These two subsystems are usually connected by a local area network, as shown in the following diagram.

As a numerical example, I'm going to use the following load-test measurements taken from Table 10.6 in my Perl::PDQ book. There, Z = 0 also.

The first three columns show the number of vusers (N) in each test, the thoughput (X) induced by that vuser load and the corresponding response time (R). The last column (labeled orig) shows R in milliseconds (ms), as reported by the load-test tool. The point being made in the book is that R appears to level off above about N = 120 vusers, and this was a consequence of exhausting the thread pool on the DVR side. We shall assume that effect isn't present in the following discussion, and simply work with the R values as they appear in column C. The more significant point for our purpose is the make sure everything is in the same time base, viz., seconds and X in TPS.

Little's law states that the number of requests active in the SUT is give by the product X × R. This number appears in column N of the next table. We can check that this works because X × R = 398.5 active requests in the SUT when N = 400 vusers, which is close enough for science.

The next question is: How many user are outside the SUT? The steps to estimate that number are shown in the columns of the following table. You can click on it to make it larger.

Column S shows some an averaged time based on typical Gomez measurements taken at several different geographical locations. This is the time it takes to issue a web request, for example, and get the response data back to render the web page at the client. In other words, the Gomez time (G) is the sum of the Internet latency (I) and the residence time (R) on the SUT or G = I + R in the test-rig diagram above. But we know R from the load-test measurements in the first table. Therefore, the Internet latency I = G - R. That time appears in column T.

The next question is: What to choose as Z for real users? Determining that value for your application could require a lot of work, that may or may not have already been undertaken. Here, I'll just use the mean Z value specified in the now defunct TPC-W benchmark, which is Z = 7 seconds (and a maximum of 70 seconds). You can insert your own value, if you know it. Since the time spent outside the SUT is I + Z, the average number of real users in that state must be X × (I + Z), which is shown in column V. The total number of real users that can be supported is X × R + X × (I + Z) = X × (R + I + Z). In other words, whereas 100 vusers might be active in the SUT, the total number of real users that can be supported by this application is more like 4000.

I'd like to thank Shanti Subramanyam for discussions about how to use Little's law in these calculations. You might find her blog post on this topic useful as well.


Chris Merrill said...

While this approach may test approximately the same number of transactions/sec arriving at the server with a smaller number of VUs, it is worth pointing out that it does NOT approximate the same number of active sessions OR concurrent network connections. Either of these can easily be limiting factors in a system - as can a number of other concurrency-related factors. Using the artificially short think times in this manner will not give many organizations the level of assurance they need for mission-critical systems.

disclaimer: I work for a vendor of load testing software (which is much less expensive than Load Runner).

Chris Merrill

Stefan Parvu said...

Mapping the real users to no. of TCP sessions, application sessions is very important. Found rather difficult to quantify this, depending from case to case. But indeed, these are as well important factors too.

Another point: we all talk about test rigs implemented as a closed circuits:
new requests arrive *after* they been processed and a sleep time, our Z, passed. But what if we keep this model, the closed circuit and add a second line, as a constant flow of new requests to the test rig ? Call this a
noise line.

What will happen to my R, X ? Will R fluctuate more severe than in a closed
circuit ? Will R be more accurate ?

Found some paper on ACM some time ago, never had the chance to read:
Open vs Closed

Shanti said...

To Chris: This approach is a meaningful first step in performance testing. Once you identify the throughput related scalability bottlenecks in the system and fix it, you can then load test with the large number of sessions/connections to check for bottlenecks associated with #conenctions.