Workshop: How to Do Performance Analytics with R, Mon Nov 2, 8-12am
You've collected cubic light-years of performance monitoring data, now whaddya gonna do? Raw performance data is not the same thing as information, and the typical time-series representation is almost the worst way to glean information. Neither your brain nor that of your audience is built for that (blame it on Darwin). To extract pertinent information, you need to transform your data and that's what the R statistical computing environment can help you do, including automatically.Topics covered will include:
- Introduction to R using RStudio
- Descriptive statistics
- Performance visualization
- Data reduction techniques
- Multivariate analysis
- Machine learning techniques
- Forecasting with R
- Scalability analysis
Invited talk: Hadoop Super Scaling, Wed Nov 4, 5-6pm
The Hadoop framework is designed to facilitate parallel-processing massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop has a broad range of corporate users, a number of companies offer commercial implementations or support for Hadoop.However, certain aspects of Hadoop performance---especially scalability---are not well understood. One such anomaly is the claimed flat scalability benefit for developing Hadoop applications. Another is that it's possible to achieve faster than parallel processing. In this talk I will explain the source of these anomalies by presenting a consistent method for analyzing Hadoop application scalability.
CMG-T: Capacity and Performance for Newbs and Nerds, Thur Nov 5, 9-11am
In this tutorial I will bust some entrenched myths and develop basic capacity and performance concepts from the ground up. In fact, any performance metric can be boiled down to one of just three metrics. Even if you already know metrics like, throughput and utilization, that's not the most important thing: it's the relationship *between* those metrics that's vital! For example, there are at least three different definitions of utilization. Can you state them? This level of understanding can make a big difference when it comes to solving performance problems or presenting capacity planning results.Other myths that will get busted along the way include:
- There is no response-time knee.
- Throughput is not the same as execution rate.
- Throughput and latency are not independent metrics.
- There is no parallel computing.
- All performance measurements are wrong by definition.
No particular knowledge about capacity and performance management is assumed.
See you in San Antonio!