In performance engineering scenarios that use commercial load testing tools, e.g., LoadRunner, the question often arises: How many virtual users (vusers) should be exercised in order to simulate by some expected number of real users? This is important, more often than not, because the requirement might be to simulate thousands or even tens of thousands of real users, but the stiff licensing fees associated with each vuser (above some small default number) makes that cost-prohibitive. As I intend to demonstrate here, we can apply Little's law to map vusers to real users.
A commonly used practical approach to ameliorate this circumstance is to run the load test scenarios with zero think time (i.e., Z = 0) in the client scripts on the driver (DVR) side of the test rig. This choice effectively increases the number of active transactions running on the system under test (SUT), which might include apps servers and database servers. These two subsystems are usually connected by a local area network, as shown in the following diagram.
As a numerical example, I'm going to use the following load-test measurements taken from Table 10.6 in my Perl::PDQ book. There, Z = 0 also.
The first three columns show the number of vusers (N) in each test, the thoughput (X) induced by that vuser load and the corresponding response time (R). The last column (labeled orig) shows R in milliseconds (ms), as reported by the load-test tool. The point being made in the book is that R appears to level off above about N = 120 vusers, and this was a consequence of exhausting the thread pool on the DVR side. We shall assume that effect isn't present in the following discussion, and simply work with the R values as they appear in column C. The more significant point for our purpose is the make sure everything is in the same time base, viz., seconds and X in TPS.
Little's law states that the number of requests active in the SUT is give by the product X × R. This number appears in column N of the next table. We can check that this works because X × R = 398.5 active requests in the SUT when N = 400 vusers, which is close enough for science.
The next question is: How many user are outside the SUT? The steps to estimate that number are shown in the columns of the following table. You can click on it to make it larger.
Column S shows some an averaged time based on typical Gomez measurements taken at several different geographical locations. This is the time it takes to issue a web request, for example, and get the response data back to render the web page at the client. In other words, the Gomez time (G) is the sum of the Internet latency (I) and the residence time (R) on the SUT or G = I + R in the test-rig diagram above. But we know R from the load-test measurements in the first table. Therefore, the Internet latency I = G - R. That time appears in column T.
The next question is: What to choose as Z for real users? Determining that value for your application could require a lot of work, that may or may not have already been undertaken. Here, I'll just use the mean Z value specified in the now defunct TPC-W benchmark, which is Z = 7 seconds (and a maximum of 70 seconds). You can insert your own value, if you know it. Since the time spent outside the SUT is I + Z, the average number of real users in that state must be X × (I + Z), which is shown in column V. The total number of real users that can be supported is X × R + X × (I + Z) = X × (R + I + Z). In other words, whereas 100 vusers might be active in the SUT, the total number of real users that can be supported by this application is more like 4000.
I'd like to thank Shanti Subramanyam for discussions about how to use Little's law in these calculations. You might find her blog post on this topic useful as well.