The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.
However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:
- So called flat development scalability
- Super scaling performance
- New TPC big data benchmark
See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.
Therefore, I've added a new module on Hadoop performance and capacity management to the Guerrilla Capacity Planning course material that also includes such topics as:
- There are only 3 performance metrics you need to know
- How performance metrics are related to one another
- How to quantify scalability with the Universal Scalability Law
- IT Infrastructure Library (ITIL) for Guerrillas
- The Virtualization Spectrum from hyperthreads to hyperservices
- Hadoop performance and capacity management
Early bird registration ends in 5 days.
I'm also interested in hearing from anyone who plans to adopt Hadoop or has experience using it from a performance and capacity perspective.
No comments:
Post a Comment