N+1 = 4 RedundancyWe begin by considering a small-N configuration of four hosts where the load is distributed equally to each of the hosts. For simplicity, the load distribution is assumed to be performed by some kind of load balancer with a buffer. The idea of N+1 redundancy is that the load balancer ensures all four hosts are equally utilized prior to any failover.
The idea is that none of the hosts should use more than 75% of their available capacity: the blue areas on the left side of Fig. 1. The total consumed capacity is assumed to be $4 \times 3/4 = 3$ or 300% of the total host configuration (rather than all 4 hosts or 400% capacity). Then, when any single host fails, its lost capacity is compensated by redistributing that same load across the remaining three available hosts (each running 100% busy after failover). As we shall show in the next section, this is a misconception.
The circles in Fig. 1 represent hosts and rectangles represent incoming requests buffered at the load-balancer. The blue area in the circles signifies the available capacity of a host, whereas white signifies unavailable capacity. When one of the hosts fails, its load must be redistributed across the remaining three hosts. What Fig. 1 doesn't show is the performance impact of this capacity redistribution.
N+1 = 4 PerformanceThe performance metric of interest is response time as it pertains to service targets expressed in SLAs. To assess the performance impact of failover, we model the N+1 configuration as an M/M/4 queue with per-server utilization constrained to be no more than 75% busy.
When a failover event occurs, the configuration becomes an M/M/3 queue. The corresponding response time curves are shown in Fig. 2. The y-axis is the response time, R, expressed as multiples of the service period, S. A typical scenario is where the SLA (horizontal line) corresponds to maximum or near maximum utilization. The SLA in this case is a mean response time no greater than 1.45 service periods.
On failover, only three hosts remain available, and the SLA will be exceeded because the utilization of each host will be heading for 100% due to the additional load. (See Figs. 1 and 2.) Correspondingly, this has the effect of pushing the response time very high up the M/M/3 curve. In order to maintain the SLA, the load would have to be reduced so that it corresponds to an even lower utilization than originally anticipated, viz., 68.25% instead of 75%. Fig. 3 shows this effect in more detail.
In practice, proper capacity planning, such as the M/M/m queueing models employed in this discussion, would have revealed that the maximum host utilization should not have exceeded 68.25% busy in the N+1 configuration.
Large-N PerformanceWith a large number of hosts the difference in response time after failover becomes less significant. This follows from the fact that response-time curves for an M/M/m queue are flatter at high utilizations when the number of servers, m, is large. The effect is illustrated in Fig. 4 for N+1 = 16 hosts. (cf. Fig. 2)
However, the most common installations are small-N configurations of the type discussed in the previous section. Therefore, preserving your SLA requires capacity planning based on host utilizations that match your SLA targets.
Thanks to the GCaP class participants for doing a group-edit on this post in real time.