Recently, Guerrilla alumnus, Scott J. pointed me at this Chart of the Day showing how Google revenue growth was outpacing both Facebook and Yahoo, when compared 7 years after launching the respective companies.
Clearly, this chart is intended to be an attention getter for the Silicon Alley Insider website but, it looks about right and normally I might have just accepted the claim without giving it anymore thought. The notion that Google growth is dominating, is also consistent with a lot of other things one sees. No surprises there.
Exponential doubling period
In this particular case, however, I was struck by the shape of the data and curious to find out if the growth of GOOG and FB revenue follows an exponential trend or not. Exponential growth is not unexpected because it's the continuous analog of compound interest. If they are growing exponentially, I can compare their doubling periods numerically and determine by how their growth will look in the future.
The doubling period is an analysis technique that I use in Chapter 8 of my Guerrilla Capacity Planning book to determine the traffic growth of major websites. In section 8.7.5 the doubling time t2 is defined as:
t2 = Ln(2) / A
where A is the growth parameter of the fitted exponential curve (the rate at which it bends upward) and Ln(2) is the natural logarithm of 2 (2 for doubling). The only fly in the ointment is that I don't have the actual numeric values used in the histogram chart, but that need not be a showstopper. There are only a half dozen data points for each company, so I can estimate them visually. Then, I can use R to fit the exponential models and calculate the respective doubling times.
Analysis in R
First, we read the data (as eyeballed from the online chart) into R. Since the amount of data is small, I simply use the textConnection trick to write the data in situ, rather than using an external file.
gd <- read.table(textConnection("Year GOOG FB\tYAH
1 0.001 0.002 0.001
2 0.01 0.02 0.01
3 0.1 0.2 0.1
4 0.5 0.45 0.3
5 1.5 0.75 0.6
6 3.2 2.0 1.1
7 6.1 4.0 0.75"),
header=TRUE,sep="\t")
closeAllConnections()
I can now plot those estimated data points and compare them with the original chart.
plot(gd$Year,gd$GOOG,type="b",col="green",lwd=2,lty="dashed",
main="Annual revenues for GOOG (green), FB (blue), YAH (red)",
xlab="Years after launch", ylab="$ billions")
points(gd$Year,gd$FB,type="b",col="blue",lwd=2,lty="dashed")
points(gd$Year,gd$YAH,type="b",col="red",lwd=2,lty="dashed")
The result looks like this:
The dashed lines simply connect related points together. The two solid lines are produced by performing the corresponding exponential fits to the GOOG and FB data.
# x-values for continuous exp curves
x<-seq(from=1, to=7, by=0.1)
ggfit<-nls(gd$GOOG ~ g0*exp(g1*gd$Year),data=gd,start=list(g0=1,g1=1))
gc<-coef(ggfit)
lines(x,y=gc[1]*exp(gc[2]*x))
fbfit<-nls(gd$FB ~ f0*exp(f1*gd$Year),data=gd,start=list(f0=1,f1=1))
fc<-coef(fbfit)
lines(x,y=fc[1]*exp(fc[2]*x))
# report the doubling periods
text(1,5.0,sprintf("%2s doubling time: %4.2f months", names(gd)[2],12*log(2)/gc[2]),adj=c(0,0))
text(1,4.5,sprintf("%2s doubling time: %4.2f months", names(gd)[3],12*log(2)/fc[2]),adj=c(0,0))
From the R analysis we see that the doubling period for Google (t2 = 11.39 months) is slightly longer than that for Facebook (t2 = 10.94 months). Despite the banner claim made by Silicon Alley Insider, based on these estimated data, Google is growing revenue at a slightly slower rate than Facebook. How can that be?
Conclusion
In the original histogram chart, it looks like Google is growing faster than Facebook. Well, looks can be deceiving. Your brain can be fooled (easily) by optical illusions. That's why we need to do analysis in the first place. Viewed uncritically, your brain can easily be led astray.To resolve this paradox, let's do two things:
- Project the growth models out further than the 7 years associated with the data
- Plot the projected curves on log-linear axes (for reasons that will become clear shortly)
The left-hand plot shows that the two curves cross somewhere between 7 years out and 40 years out. Whereas green (Google) is currently on top, according to the data, blue (Facebook) eventually ends up on top according to the exponential models; assuming nothing else changes in the future. The right-hand plot uses a log-scaled y-axis to reveal more clearly that the crossover occurs at t = 23.9 years. Once again, if you rely purely on visuals, you might think the crossover doesn't occur until after 30 years (what looks like a "knee" in the left-hand plot), but you'd be misled. It occurs almost 10 years earlier.
If, for example, you were only interested in short-term gains (as Wall St is wont to do), the original visual (histogram) is correct. If, on the other hand, you are in your 20s and investing longer term, e.g., for your retirement, you might get a surprise.
By now, you might be thinking that these projections are not very accurate, and I wouldn't completely disagree with you. But what is accurate here? The original data in the histogram (even the really real actual data) probably aren't very accurate either; we really can't know without deeper investigation. And that's my point: independent of the accuracy of the data, the numerical analysis can cause you to pay attention to, and possibly ask questions about, something you might otherwise have taken for granted on purely visual grounds.
I'm a big fan of data visualization, but not to the exclusion of numerical analysis. We need both and we need both to be easily accessible.
While I agree with your point and the message you are trying to get across I think you need to look closer at your model fits.
ReplyDeleteAssuming you buy into the model the value of A for the Google fit is 0.731 with a standard error of 0.043 and the fit for the Facebook model the value of A is 0.760 with a standard error of 0.033. With such a small difference in the fitted parameters and the relatively large standard error the numerical analysis would not conclude the two model fits are different. Plus there is a fairly wide range of what the doubling interval is. In short with seven data points it is real hard to trust the results of the model you have decided to use.
Excellent piece.
ReplyDeleteMight be one of your best.
Excellent piece.
Might be one of your best.
- clear, concise & well written, good logical progression etc.
- starts with "Something Really Obvious" we can all see for ourselves and agree with
- pose a problem (no datapoints), get over it
- then proceed with straight-forward analyses
- and end up with a surprising, even counter-intuitive, result
By demonstration you've told us:
- things aren't always what they seem, don't just take things on first appearances
- "digging deeper" can be quick and easy
- the tools/techniques to do this are quick and easy, and simple to master.
You might go as far as saying, "always check your results"... Which is Just Good Science.
But what I love is the way you've quietly demonstrated that Xerox PARC observation:
"Point of View is worth 40 to 60 IQ points".
Good one. works at many levels.
Belated response to Larry C's comment.
ReplyDeletePoint taken and indeed, I might express it a little differently. If we imagine that the modeled curves were drawn using fatter lines, then the whole notion of "crossing" comes into question. cf. Confidence bands for USL scaling curves.
However, I deliberately didn't show the summary stats on the log fit because I view the whole procedure a bit differently---perversely, perhaps.
Although I've never actually written this down before, the idea is to barrel through to an end point and then review. Roughly put, the steps are:
- Look at the plot
- Question: Is it log growth?
- Problem: No numeric values
- Solution: Guesstimate them
- Do the fit
- Question: What's the doubling period?
- Do the calculation
- Problem: Opposite from claim
- Question: Why?
- Solution: Curves cross at ~20 years (modulo above qualification).
In other words, I don't want to get distracted by numerical details in this phase. The goal is simply to reach a self-consistent explantion to the question about log growth. We may have to go back and revise the whole thing but, hopefully,
we'll know what needs revision and why.