Saturday, October 8, 2016

Crib Sheet for Emulating Web Traffic

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools is now available online and will be presented at the upcoming CMG conference in November.

The motivation for this work harks back to a Guerrilla forum in 2014 that essentially centered on the same topic as the title of our paper. It was clear from that discussion that commenters were talking at cross purposes because of misunderstandings on many levels. I had already written a much earlier blog post on the key queue-theoretic concept, viz., holding the $N/Z$ ratio constant as the load $N$ is increased, but I was incapable of describing how that concept should be implemented in a real load-testing environment.

On the other hand, I knew that Jim Brady had presented a similar implementation in his 2012 CMG paper, based on a statistical analysis of the load-generation traffic. There were a few details that I couldn't quite reconcile in Jim's paper but, at the CMG 2015 conference in San Antonia, I suggested that we should combine our separate approaches and aim at a definitive work on the subject. After nine months gestation (ugh!), this 30-page paper is the result.

Although our paper doesn't contain any new invention, per se, the novelty lies in how we needed to bring together so many disparate and subtle concepts in precisely the correct way to reach a complete and consistent methodology. The complexity of this task was far greater than either of us had imagined at the outset. The hyperlinked Glossary should help with the terminology, but because there are so many interrelated parts, I've put together the following crib notes in an effort to help performance engineers get through it (since they're the ones that most stand to benefit).

  1. Standard load testing tools have a finite number of virtual users
  2. Web traffic is characterized by an indeterminate number of users
  3. Attention is usually focused on the performance of the SUT (system under test)
  4. We focus on the DVR (driver) side performance for web traffic
  5. Examine distribution of arriving requests and their mean rate
  6. Web traffic should be a Poisson process (just like A.K. Erlang used in 1909)
  7. That requires statistically independent arrivals (i.e., no correlations)
  8. We also refer to these as asynchronous requests
  9. Standard virtual users become correlated in the queues of the SUT
  10. We refer to these as synchronous requests
  11. We decouple them by reducing the length of queues in the SUT
  12. This is achieved by increasing think delay $Z$ as the load $N$ is increased (Principle A in the paper)
  13. Traffic the approaches a constant mean rate $\lambda_{rat} = N/Z$ as SUT queues decrease
  14. Check the traffic is indeed Poisson by measuring the coefficient of variation ($CoV$)
  15. Must have $CoV = 1$ for a Poisson process (Principle B in the paper)
Originally, I assumed the paper would be no more than a third it's current length but, try as we might, that was not to be. My only defense is: it's all there, you just need to read it. Apologies in advance, but hopefully, the crib notes will help.

No comments: