As I discuss in Chap. 9 of my Perl::PDQ book, the qualitative design rules for incrementally increasing the scalability of distributed apps go something like this:
- Move code, e,g., business logic, from the App Servers to the RDBMS backend and repartition the DB tables.
- Use load balancers between every tier. This step can accommodate multi-millions of pageviews per day.
- Partition users into groups of 100,000 across replicated clusters with partitioned DB tables.
All the big web sites (e.g., eBay.com and EA.com) do this kind of thing. But these rules-of-thumb beg the question, How can I quantify the expected performance improvement for each of these steps? Here, I only hear silence. But there is an answer: the Universal Scalability Law. However, it needs to be generalized to accommodate the concept of homogeneous clustering, and I do just that in Section 4.6 of my GCaP book.
The following slides (from 2001) give the basic idea from the standpoint of hardware scalability.
Think of each box as an SMP containing p-processors or a CMP with p-cores. These processors are connected by a local bus, e.g., a shared-memory bus; the intra-node bus. Therefore, we can apply the Universal Scalability model as usual, keeping in mind that the 2 model parameters refer to local effects only. The data for determining those parameters via regression could come from workload simulation tools like LoadRunner. To quantify what happens in a cluster with k-nodes, an equivalent set of measurements have to be made using the global interconnect between cluster nodes; the inter-node bus. Applying the same statistical regression technique to those data gives a corresponding pair of parameters for global scalability.
The generalized scalability law for clusters is shown in the 5th slide. If (in some perfect world) all the overheads were exactly zero, then the clusters would scale linearly (slide 6). However, things get more interesting in the real world because the scaling curves can cross each other in unintuitive ways. For example, slide 7 "CASE 4" shows the case where the level of contention is less in the global bus than it is the local bus, but the (in)coherency is greater in the global bus than the local bus. This is the kind of effect one might see with poor DB table partitioning causing more data than anticipated to be shared across the global bus. And it's precisely because it is so unintuitive that we need to do the quantitative analysis.
I intend to modify these slides to show how things scale with a variable number of users (i.e., software scalability) on a fixed hardware configuration per cluster node and present it in the upcoming GCaP class. If you have performance data for apps running on clusters of this type, I would be interested in throwing my scalability model at it.