The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.
However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:
- So called flat development scalability
- Super scaling performance
- New TPC big data benchmark
See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.