Thursday, July 3, 2014

How to Remember the Poisson Distribution

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 values are

> ppois(0:15,4)
 [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

Thursday, June 26, 2014

Load Average in FreeBSD

In my Perl PDQ book there is a chapter, entitled Linux Load Average, where I dissect how the load average metric (or metrics, since there are three reported numbers) is computed in shell commands like uptime, viz.,
[njg]~/Desktop% uptime
16:18  up 9 days, 15 mins, 4 users, load averages: 2.11 1.99 1.99

For the book, I used Linux 2.6 source because it was accessible on the web with convenient hyperlinks to navigate the code. Somewhere in the kernel scheduler, the following C code appeared:


#define FSHIFT    11          /* nr of bits of precision */ 
#define FIXED_1   (1<<FSHIFT) /* 1.0 as fixed-point */ 
#define LOAD_FREQ (5*HZ)      /* 5 sec intervals */ 
#define EXP_1     1884        /* 1/exp(5sec/1min) fixed-pt */ 
#define EXP_5     2014        /* 1/exp(5sec/5min) */ 
#define EXP_15    2037        /* 1/exp(5sec/15min) */ 

#define CALC_LOAD(load,exp,n) \
 load *= exp; \
 load += n*(FIXED_1-exp); \
 load >>= FSHIFT;

where the C macro CALC_LOAD computes the following equation

Friday, June 6, 2014

The Visual Connection Between Capacity And Performance

Whether or not computer system performance and capacity are related is a question that comes up from time to time, especially from those with little experience in either discipline. Most recently, it appeared on a Linked-in discussion group:
"...the topic was raised about the notion that we are Capacity Management not Performance Management. It made me think about whether performance is indeed a facet of Capacity, or if it belongs completely separate."

As a matter of course, I address this question in my Guerrilla training classes. There, I like to appeal to a simple example—a multiserver queue—to exhibit how the performance characteristics are intimately related to system capacity. Not only are they related but, as the multiserver queue illustrates, the relationship is nonlinear. In terms of daily operations, however, you may choose to focus on one aspect more than the other, but they are still related nonetheless.

Thursday, June 5, 2014

Importing an Excel Workbook into R

The usual route for importing data from spreadsheet applications like Excel or OpenOffice into R involves first exporting the data in CSV format. A newer and more efficient CRAN package, called XLConnect (c. 2011), facilitates reading an entire Excel workbook and manipulating worksheets and cells programmatically from within R.

XLConnect doesn't require a running installation of Microsoft Excel or any other special drivers to be able to read and write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE). Moreover, XLConnect can handle older .xls (BIFF) as well as the newer .xlsx (Office XML) file formats. Internally, XLConnect uses Apache POI (Poor Obfuscation Implementation) to manipulate Microsoft Office documents.

As a simple demonstration, the following worksheet, from a Guerrilla Capacity Planning workbook, will be displayed in R.

First, the Excel workbook is loaded as an R object:

Monday, June 2, 2014

Guerrilla Training in July 2014

Time to register online for the upcoming GBoot (July 17) and GCaP (July 21) training classes. This is your fast track for learning how to do enterprise performance and capacity management.

Entrance Larkspur Landing hotel Pleasanton California

Classic topics include:

  • How performance metrics are related to each another
  • Queueing theory for those who can't wait
  • How to quantify scalability with the Universal Scalability Law (USL)
  • IT Infrastructure Library (ITIL) for Guerrillas
  • The Virtualization Spectrum from hyperthreads to hyperservices

All classes are held at the Larkspur Landing hotel in Pleasanton, California. Directions are available on the registration page. Larkspur Landing also provides free wi-fi Internet in their residence-style rooms as well as the training room.

Saturday, April 26, 2014

Guerrilla Data Analysis Class, May 19th

This is your fast track to enterprise performance and capacity management with an emphasis on applying R statistical tools to your performance data.

Discounts are available for 3 or more people from the same company. Enquire for details.

Entrance Larkspur Landing hotel Pleasanton California

All classes are held at our lovely Larkspur Landing location in Pleasanton. Larkspur Landing also provides free Wi-Fi Internet access in their residence-style rooms, as well as our classroom.

Some of the topics covered will include:

  • Introduction to using R
  • How to Detect Bad Data
  • Measurement Errors and Analysis
  • Multivariate Linear and Nonlinear Regression
  • Quantitative Scalability Analysis
  • Statistical forecasting techniques for CaP
  • Machine learning algorithms applied to CaP Data
  • Wild Not Mild Data: SQL access, web traffic, data recovery
  • Taming the Data Torrent with PCA and PerfViz techniques

Attendees should bring their laptops to the class as course materials are provided on a flash drive and calculational tools like OpenOffice, Excel and R will be useful.

Before registering online, take a look at what former students have said about their Guerrilla training experience.

Tuesday, April 1, 2014

Melbourne's Weather and Cross Correlations

During a lunchtime discussion among recent GCaP class attendees, the topic of weather came up and I casually mentioned that the weather in Melbourne, Australia, can be very changeable because the continent is so old that there is very little geographical relief to moderate the prevailing winds coming from the west.

In general, Melbourne is said to have a mediterranean climate, but it can also be subject to cold blasts of air coming up from Antarctic regions at any time, but especially during the winter. Fortunately, the island state of Tasmania acts as something of a geographical barrier against those winds. Understanding possible relationships between these effects presents an interesting exercise in correlation analysis.