Thursday, January 27, 2011

Idleness Is Not Waste

A common fallacy is to view all idle CPU cycles as wasted server capacity. It's not unusual for management and various bean-counters to display a reluctance to procure new hardware if unused cycles are clearly observable on existing hardware. This puts the pressure on sys admins to reduce idleness. Such is often the case during consolidation efforts: cram as many apps as possible onto a server to soak up every remaining CPU cycle.

All performance analysis and capacity planning is essentially about optimizing resource usage under a particular set of constraints. The fallacy is treating maximization as optimization. This mistake is further exacerbated if only one performance metric, i.e., CPU utilization, is taken into account: a common situation promoted by the superficiality of performance dashboards. Maximization doesn't necessarily mean 100% utilization, either. The same is true even if some amount of CPU capacity is retained as headroom for workload growth. The tendency to "redline" it can still prevail.

You can't optimize a single number. Server utilization has to be optimized with respect to other measures, e.g., application response-time targets. We know from simple queueing theory that response time increases nonlinearly (the proverbial "hockey stick") with increasing server utilization. If the response-time goals are being met at 10% CPU busy, pre-consolidation, then almost certainly they will be exceeded at higher CPU utilization, post-consolidation. The response-time metric is an example of a cost that has to be taken into account to satisfy all the constraints of the optimized capacity plan.

Maximizing server utilization is as foolhardy as maximizing revenue. Both goals look attractive on their face, but if you don't keep track of outgoing CapEx and OpEx costs incurred to generate revenue, you could lose the company!


Fred said...

I would agree with the title of this post if it said "Not all idleness is waste", but would argue that there is also idleness that is waste. To differentiate between the two kinds of idleness one needs tools to monitor the use of all (not just CPU) resources and to reduce wasteful idleness tools to manage the use of these resources. This is what Librato Silverline ( aims to provide.

Noons said...

Great post.

This reminded me of the ex-IBM financial genius who stuffed up Prime Computers in the mid-80s.

He proceeded to look on a "dashboard"-like Mac spreadsheet at the costs of the company.

There was this huge column called "R&D".

He chopped it to 0 and the bottom line went ballistic, with great rejoice from the stock market and lots of bonuses spread around the board.

Then 4 years later, Prime disappeared...