My reasoning goes like this:
To make statistically meaningful estimates for models like linear least squares, you need at minimum about half a dozen data points. This is the kind of rule of thumb (RoT) I developed for USL scalability modeling to be meaningful.
Now consider that same RoT from another standpoint. If you are Amazon.com, for example, and you want to estimate your Christmas growth trend, then you need about half a dozen data points. That means 5 or 6 years of data.
That time scale for a historical data repository certainly distinguishes CaP from simple performance monitoring (no storage) and performance analysis (short-term history) where you are looking for diagnostic patterns rather than trends.
How much data is kept during those 5 yrs is a secondary, sampling question. As surely as you throw away arbitrary periods of data those will become the periods you will need at some later time for CaP purposes. In fact, it's probably more work to selectively remove certain data periods than it is to automatically keep the lot. But keeping everything might be overkill. That's typically where data aggregation comes in. The further back in time you go, the less likely you will need events with fine time-granularity.
With this RoT in mind, I decided to google the topic but discovered there is very little that is relevant to CaP since most commentators are typically considering data storage for applications (e.g., an RDMS), not historical data for CaP analysis. FWIW, here's what I found during a relatively quick review:
- SMB planning mentions 5 years without any justification
"knowing what your business is doing...in an 18-month time frame, to a three-year time frame, to a five-year time frame. You really need to plan out that far, but if you do, it's fantastic. If you have a five-year plan, it will trickle down more easily into getting your hands around capacity and growth."
- Oracle example of 6 weeks
"Suppose, for example, you always want to be able to view hour data for the previous six weeks. ..."
- IBM up to 2 months before aggregating ("pruning")
"Capacity planning and predictive alerting: For capacity planning and predictive analytics, you typically perform long term trend analysis. The Performance Analyzer, for example, uses Daily summarization data for the predefined analytic functions. So, in most cases, configure daily summarization. You can define your own analytic functions and use Hourly or Weekly summarization data."
"For the analytic functions to perform well, ensure that you have an appropriate number of data points in the summarized table. If there are too few, the statistical analysis will not be very accurate. You will probably want at least 25 to 50 data points. To achieve 50 data points using Daily summarization, you must keep the data for 50 days before pruning. More data points will make the statistical predictions more accurate, but will affect the performance of your reporting and statistical analysis. Consider having no more than a few hundred data points per resource being evaluated. If you use Hourly summarization, you get 336 data points every 2 weeks."
"Adaptive monitoring (dynamic thresholding): Keep 7 to 30 days of detailed data when comparing all work days. If you compare Monday to Monday, then you need to keep the Detailed data much longer to be able to establish a trend. When comparing a specific day of the week, you will
probably need to have at least 60 days of data.
- RRDtool example 2 years for consolidation decisions.
"if now is March 1st, 2009, do you want to look at 2007-03-01 until 2009-03-01 or
do you want to be able to look at 2007-03-01 midnight to next midnight."
"What you need to understand here is consolidation. Say that you will be looking at two years worth of information, and that the available data is in a resolution of 300 seconds per bucket. This means you have more than 200,000 buckets."
"Example: Say you want to be able to display the last 2 years, the last 2 months, the last 2 weeks and the last 2 days. The database uses the default step size of 300
seconds per interval."
Ultimately, I'd like to turn my 5-year RoT into a Guerrilla Mantra. Send me your thoughts and comments to help me get there.