Tuesday, December 27, 2011

A List of CaP Skills

This question popped up recently on Linkedin:
"Can someone tell me what skill set should a Performance and Capacity Analyst have and develop throughout his career?"
and I realized that, although I have a kind of list in my head, and I talk about such skills in my classes, I have been too lazy to write them down anywhere; which is pretty dumb. I must try to do something about that (New Year resolution? What are the odds?). In some ways, my fallback is the online Guerrilla Manual. Anyway, here is my (slightly edited) response to the LI question, and let it therefore constitute my first attempt at writing down such a list.
Unlike the usual set of computer-based skills needed for programming or app dev or kernel dev or being a network engineer or a sysadm, CaP (Capacity planning and Performance analysis) is more like a physical science, in my view.
It all starts and ends with the data.

Physical sciences involve such skills as:

  1. Collection and observation of raw data
  2. Controlled experiments in the lab to find clearer relationships in the data
  3. Use of tools to create and analyze data
  4. Development of models to both explain data and predict effects
  5. Publishing/marketing results and budgeting for experimental equipment
Scientists usually don't like to acknowledge (5). "Marketing" is a dirty word, but they do it anyway. :)
It's all marketing!
I'm also including the biological sciences here; the main difference being that bio systems can up and die on you when testing. Rebooting is not an option, and that can really ruin your data.

Translating the above skills list for CaP:

  1. Collection of raw data using monitoring tools
  2. Load testing simulations to determine stimulus-response relationships
  3. Use of Excel, R, visualization, etc., to analyze data
  4. Development of models to both explain and predict
  5. Marketing/publishing your analysis and knowing the financial ramifications
To make each of these skills manifest requires:
  • understanding the relevant technologies
  • understanding of good deal of math and statistics (without being a mathematician)
  • being able to communicate technical concepts to mere mortals (e.g., the CEO)
  • ability to perform diagnostic experiments
  • being a generalist vs. a specialist (Get your face out of the bit bucket)
  • ability to spot patterns and analogies (e.g., UNIX load average acts like an RC electric circuit)
  • paying attention to the right details (Put that kitchen sink down!)
  • cross-checking your conclusions (e.g., Therefore: lightweight heavy processes sleep furiously)
  • constantly watching market trends (to avoid being run over by the technology train)
It's also important to have a clear and unambiguous understanding of performance metrics. But even more important than that, is understanding the relationships between those metrics. Those relationships are typically nonlinear and that's what makes CaP both hard and interesting. And that's what makes all the above skills not only important but vital.
We are data scientists too!
The nonlinear relationships are often best expressed using queue-theoretic paradigms because all complex computer systems can be represented as a network (or circuit) of buffers (queues). QED

The good news here is, you will not be replaced by software or a computer any time soon.

Watson won Jeopardy. Pfft! Big deal.

Models, e.g., queueing models, are not only useful for prediction (the usage most people think of), but they are often more useful for explaining data.

A good model is better than a crystal ball (which is just a piece of glass).
Beyond the technical skills, CaP also requires connecting it all up with financial constraints/budgets in purchasing/procurement cycles, etc. That aspect varies enormously depending on the type of business and its operational requirements.
What did I miss? Comments, corrections, additions, refinements: all welcome.

1 comment:

Andrew Sliwkowski said...

What a great question...as someone who has been both a performance analyst and capacity planner ..and who has been thinking about the question for years....and have searched across hard and soft sciences(methods) .... and concerns .. I think i have come up with "Unified Field Theory" that can explain how the Fields of Performance Engineer and Capability Planning are aligned with the atomic Quality of Service Vectors and Chains.

I'll start with the 0+3 questions that Unifys all Fields:(kind of fusion of the scientific method + socratic method)

#0. Why change ?
#1. What to change ?
#2. What to change to ?
#3. How to cause the change ?

Let me explain via a User Story that explain a performance engineer aspect:
#1. "We have a batch job that processes our Trades and it takes 120 mins (on average 90% of time.......)

#2. "We need the job to complete in 100 minutes ... in order that can give our Traders more time to make money$$.... I need you to figure what it would take in terms of money, investments and liklihood of success..."

#2. "Well in order to figure out how to cause the change I need create a Resource Profile that allows me to identify Critical Bottleneck and what are the critical resources that are constraining the Batch Job.... Once I figure out that ... then I make a recommendation.... "

#2. "what do you mean I can't touch production ...how do you expect me to understand the system without putting the needed probes in"

#2. "Let me introduce you my capacity planner who promised me that it would take 100 minutes... and my performance tester who assured me that his test proved that it would take a 100 minutes... and the developer who promised that it would be 99.9999 available"...

to be continued...