The Alamo is a reference to an episode in Texan history about
defeat and revenge. But, there's nothing defeatist or mythical about the sessions I'll be giving at
CMG in San Antonio this year.
Workshop: How to Do Performance Analytics with R, Mon Nov 2, 8-12am
You've collected cubic light-years of performance monitoring data, now
whaddya gonna do? Raw performance data is not the same thing as
information, and the typical time-series representation is almost the
worst way to glean information. Neither your brain nor that of your
audience is built for that (blame it on Darwin). To extract pertinent
information, you need to transform your data and that's what the R
statistical computing environment can help you do, including
automatically.
Topics covered will include:
- Introduction to R using RStudio
- Descriptive statistics
- Performance visualization
- Data reduction techniques
- Multivariate analysis
- Machine learning techniques
- Forecasting with R
- Scalability analysis
Invited talk: Hadoop Super Scaling, Wed Nov 4, 5-6pm
The Hadoop framework is designed to facilitate parallel-processing massive
amounts of unstructured data. Originally intended to be the basis of Yahoo's
search-engine, it is now open sourced at Apache. Since Hadoop has a broad range
of corporate users, a number of companies offer commercial implementations or
support for Hadoop.
However, certain aspects of Hadoop performance---especially scalability---are
not well understood. One such anomaly is the claimed flat scalability benefit
for developing Hadoop applications. Another is that it's possible to achieve
faster than parallel processing. In this talk I will explain the source of these
anomalies by presenting a consistent method for analyzing Hadoop application
scalability.
CMG-T: Capacity and Performance for Newbs and Nerds, Thur Nov 5, 9-11am
In this tutorial I will bust some entrenched myths and develop basic
capacity and performance concepts from the ground up. In fact, any
performance metric can be boiled down to one of just three metrics. Even
if you already know metrics like, throughput and utilization, that's not
the most important thing: it's the relationship *between* those metrics
that's vital! For example, there are at least three different definitions
of utilization. Can you state them? This level of understanding can make a
big difference when it comes to solving performance problems or presenting
capacity planning results.
Other myths that will get busted along the way include:
- There is no response-time knee.
- Throughput is not the same as execution rate.
- Throughput and latency are not independent metrics.
- There is no parallel computing.
- All performance measurements are wrong by definition.
No particular knowledge about capacity and performance management is assumed.
See you in San Antonio!