Friday, March 29, 2013

Monitorama 2013 Conference

Here is my Keynote presentation that opened the first Monitorama conference and hackathon in Cambridge MA yesterday:

Comments from the #monitorama Twitter stream:

Thursday, February 7, 2013

Extracting the Epidemic Model: Going Beyond Florence Nightingale Part II

This is the second of a two part reexamination of Florence Nightingale's data visualization based on her innovative cam diagrams (my term) shown in Figure 1.

Figure 1. Nightingale's original cam diagrams (click to enlarge)

Recap

In Part I, I showed that FN applied sectoral areas, rather than a pie chart or conventional histogram, to reduce the visual impact of highly variable zymotic disease data from the Crimean War. She wanted to demonstrate that diminishing disease was due mostly to her sanitation methodologies. The square-root attenuation of magnitudes, arising from the use of sectoral areas, helped her accomplish that objective. In addition, I showed that a plausibly simpler visualizaiton could have been had with a single 24-month cam diagram. See Fig. 2.

Figure 2. Combined 24-month cam diagram

Thursday, January 31, 2013

Modem Memories Decoded

If you've ever used a phone line to connect to the Internet or sent a fax, you're familiar the racket that precedes the actual data transmission. Not exactly a Beethoven symphony. Even if you are very familiar with the sounds and know it's "handshaking" with the other modem in order to create a comms channel, you probably don't know precisely what is going on with all that warbling and hissing.

Oona Räisänen (windytan) has put together a very nice annotated sonogram and explanation on her blog. That the whole thing sounds more like bursting artillery shells than dueling banjos is partly a result of the protocol trying to defeat sophisticated circuitry for noise cancellation and echo suppression on the telephone network.

Wednesday, January 23, 2013

Going Beyond Florence Nightingale's Data Diagram: Did Flo Blow It with Wedges?

In 2010, I wrote a short blog item about Florence Nightingale the statistician, solely because of its novelty value. I didn't even bother to look closely at the associated graphic she designed, but that's what I intend to do here. In this first installment, I reflect on her famous data visualization by reconstructing it with the modern tools available in R. In part two, I will use the insight gained from that exercise to go beyond data presentation to potentially more revealing data modeling. Interestingly, I suspect that much of what I will present could also have been accomplished in Florence Nightingale's day, more than 150 years ago, albeit not as easily and not by her alone.

Figure 1. Nightingale and her data visualization (click to enlarge)

Although Florence Nightingale was not formally trained as a statistician, she apparently had a natural aptitude for mathematical concepts and evidently put a lot of thought into presenting the import of her medical findings in a visual way. Click on Figure 1 to enlarge it and view the details in her original graphic. As a consequence, she was elected the first female member of the Royal Statistical Society in 1859 and later became an honorary member of the American Statistical Association.

Why Wedges?

Why did FN bother to construct the data visualization in Figure 1? If you read her accompanying text, you see that she refers to the sectors as wedges. In a nutshell, her point in devising Figure 1 was to try and convince a male-dominated, British bureaucracy that better sanitary methods could seriously diminish the adverse impact of preventable disease amongst military troops on the battlefield. The relative size of the wedges is intended to convey that effect. Later on, she promoted the application of the same sanitation methodologies to public hospitals. She was using the established term of the day, zymotic disease, to refer to epidemic, endemic, and contagious diseases.

Friday, January 18, 2013

Linux Per-Entity Load Tracking: Plus ça change

Canadian capacity planner, David Collier-Brown, pointed me at this post about some more proposed changes to how load is measured in the Linux kernel. He's not sure they're on the right track. David has written about such things as cgroups in Linux and I'm sure he understands these things better than I do, so he might be right. I never understood the so-called CFS: Completely Fair Scheduler. Is it a fair-share scheduler or something else? Not only was there a certain amount of political fallout over CFS but, do we care about such things anymore? That was back in 2007. These days we are just as likely to run Linux in a VM under VMware or XenServer or the cloud. Others have proposed that the Linux load average metric be made "more accurate" by including IO load. Would that be local IO, remote IO or both? Disk IO, network IO, etc., etc?

Monday, January 14, 2013

The Social Network Ranking is Wrong

Call me old-fashioned, but I never saw the 2010 movie The Social Network until last year (at a private screening). In case you also missed it, it's the Hollywood version of how Facebook.com came into being.

Quite apart from any artistic criticisms, I have a genuine psychological problem with movies like TSN. I keep getting caught up in technical inaccuracies and tend to lose the plot. So, it's very hard for me to watch such movies as the director intended. It's the same reason I can't stand SciFi movies or books: I can't get past the impossible and the just plain wrong. It turns out that TSN is generally fairly accurate regarding things like Linux, MySQL, PHP, and so forth, but there is a real clanger: the ranking algorithm used by Facemash—the Facebook precursor.

There's a scene where the Mark Zuckerberg character wants to rank Harvard women based on crowd-sourced scores. He recalls that his best friend (at the time), Eduardo Saverin, had previously mentioned a ranking formula, but Zuck can't remember how it goes, so he can't code it. When Saverin shows up again, Zuck urgently asks him to reveal it. In typical Hollywood style—possibly to keep a generally math-phobic audience visually engaged—Saverin writes the ranking equations on the dorm window (see above image) for the desperate Zuckerberg. Where else would you write equations?

Here they are, reproduced with a little better formatted: \begin{align} Ea &= \dfrac{1}{1 + 10 (Rb-Ra)/400}, & Eb &= \dfrac{1}{1 + 10 (Ra-Rb)/400} \label{eqn:movie} \end{align} There's just one slight problem: they're wrong!

Friday, January 11, 2013

Oracle Java 7 Security Vulnerability

National Cyber Awareness System

US-CERT Alert TA13-010A
Oracle Java 7 Security Manager Bypass Vulnerability

Original release date: January 10, 2013

Systems Affected

Any system using Oracle Java 7 (1.7, 1.7.0) including
  • Java Platform Standard Edition 7 (Java SE 7)
  • Java SE Development Kit (JDK 7)
  • Java SE Runtime Environment (JRE 7)
All versions of Java 7 through update 10 are affected. Web browsers using the Java 7 plug-in are at high risk.