In a nutshell, whereas I typically present the USL model as having a natural fit with data collected from a controlled environment (e.g., a load testing rig), Raja has been pioneering the application of USL directly to production data. Production data presents potential problems for any kind of modeling because those data typically do not represent an environment in steady-state, i.e. they are likely to reflect significant transients. With that caveat in mind, it is sometimes just a matter of finding periods where steady-state conditions are well approximated. One such period could be a peak time when traffic is maximized. As long as nothing pathelogical is happening (i.e., huge swings in throughput or response times), we may take the peak period as being near steady state. As well, it should be possible to find other off-peak periods that also approximate steady state. In other words, viewed across an entire day, a production environment probably cannot be regarded as conforming to the definition of steady-state, but during the day, there will often be windows of time where steady-state conditions are well approximated.
Even if we can declare the peak-period to be a steady-state period on a large-scale production system, the USL is not defined for different types of users doing different kinds of work, i.e., mixed workloads. The USL, being a relatively simple model, carries no notion of heterogeneous workloads. For that, you would normally resort to a more sophisticated modeling tool, like PDQ, for example. Another challenge is, how do we determine the throughput X(1) for a single user (N = 1), in order to be able to do the required normalization for the relative capacity function in USL calculations? In particular, what does a "single user" even mean when you have different types of users doing different work? To make this more concrete, Raja provided the following example:
"An enterprise web application is seen to process 20,000 client searches, and 2,000 purchases (buys) during the peak hour. Each client search takes 10 minutes to complete end to end (i.e., login, browse home page, then search and go through results, search again). Buying takes 20 minutes end to end. To simulate this production mix in a performance test (20000 search and 2000 buys in an hour) with Loadrunner (LR), it is common practice to have 1 script per business function (BF). Assuming 1 script per BF, we would need (approx.): 20000/(60/10) = 3334 client search vusers (60/10 since 1 vuser would be able to simulate 6 search BF in an hour) and 2000/(60/20) = 667 buying vusers."
There are many points to keep in mind regarding this scenario. Let me highlight them:
- Peak load (Nmax) on the prod system involves several thousand users.
- Simulating this scale with LR is prohibitive given HP-MI license fees.
- USL can extrapolate up to Nmax using low-N LR data.
- Could avoid LR altogether and just apply USL directly to prod data for both off-peak and peak windows.
- Prod involves different users types executing different BFs.
- What does that load (N) mean for mixed BF users?
- What is the value of X(1) at N = 1 and what does it mean?
- How to define throughput X(N) for an application with mixed user loads?