Monday, January 14, 2013

The Social Network Ranking is Wrong

Call me old-fashioned, but I never saw the 2010 movie The Social Network until last year (at a private screening). In case you also missed it, it's the Hollywood version of how Facebook.com came into being.

Quite apart from any artistic criticisms, I have a genuine psychological problem with movies like TSN. I keep getting caught up in technical inaccuracies and tend to lose the plot. So, it's very hard for me to watch such movies as the director intended. It's the same reason I can't stand SciFi movies or books: I can't get past the impossible and the just plain wrong. It turns out that TSN is generally fairly accurate regarding things like Linux, MySQL, PHP, and so forth, but there is a real clanger: the ranking algorithm used by Facemash—the Facebook precursor.

There's a scene where the Mark Zuckerberg character wants to rank Harvard women based on crowd-sourced scores. He recalls that his best friend (at the time), Eduardo Saverin, had previously mentioned a ranking formula, but Zuck can't remember how it goes, so he can't code it. When Saverin shows up again, Zuck urgently asks him to reveal it. In typical Hollywood style—possibly to keep a generally math-phobic audience visually engaged—Saverin writes the ranking equations on the dorm window (see above image) for the desperate Zuckerberg. Where else would you write equations?

Here they are, reproduced with a little better formatted: \begin{align} Ea &= \dfrac{1}{1 + 10 (Rb-Ra)/400}, & Eb &= \dfrac{1}{1 + 10 (Ra-Rb)/400} \label{eqn:movie} \end{align} There's just one slight problem: they're wrong! I wasn't sure at the time, but as Saverin was writing them, I was thinking to myself, "Why wouldn't $10/400$ be simplified to $1/40$?" Even a Harvard student could figure that out. And, on continued mulling: "If woman 'a' and woman 'b' had a rank difference of $-40$, the fractions would blow up! How can you rank infinity?"

You see the kind of mental problems I have. As I continued to try and watch TSN, I decided that perhaps I had misread the equations. The scene goes by quite quickly when you're not expecting it, and I didn't want to ask the host to replay that section of the movie just for me. By the time TSN ended, there were some other suspect scenes* and it was getting too late in the evening to review the entire movie.

The next day, however, I checked the web and it wasn't hard to find references to TSN with the ranking algorithm scene, some of which thankfully included a video clip. To my surprise, nobody else seems to have spotted the error. Sure enough, the equations that Saverin writes are indeed those in \eqref{eqn:movie}. Moreover, I also found the original movie script, and on pages 15 and 16 the equations are incorrect there too—written exactly as they appear in \eqref{eqn:movie}. The guy who played Eduardo Saverin must be a good actor (and not mathematically inclined) because he followed the script exactly.

So, what are the correct equations? It turns out the equations come from ranking chess players, which is probably how the real Eduardo Saverin knew about them. That's how it's portrayed in TSN. Anyway, Saverin didn't invent them. The correct equations are: \begin{align} Ea &= \dfrac{1}{1 + 10^{ \big( \frac{Rb-Ra}{400} \big) }}, & Eb &= \dfrac{1}{1 + 10^{ \big( \frac{Ra-Rb}{400} \big) }} \label{eqn:chess} \end{align} Note the exponential form. This has the effect of giving the ranking equations a complementary sigmoidal shape such that when the score of woman 'a' is up, the score of woman 'b' is correspondingly down. This fits with the requirement (not mentioned in TSN) that $Ea + Eb = 1$: the relative scores have to add to 100%.

Furthermore, this is not just a simple typographical error due to a missing ^ (caret) exponentiation character in Excel or Word (the most likely applications used for the movie script). The 400 has to go inside the exponent of \eqref{eqn:chess}, which means that more brackets are needed than shown in \eqref{eqn:movie}. In other words, the movie equations are a bit of a mess, transcribed by someone (presumably the scriptwriter) who had no idea what he was writing. Nothing new, I suppose, but these days Hollywood is known to hire experts (including mathematicians) to verify movie concepts. But not in this case, apparently.

Comparison of chess ranking function and Fermi-Dirac function in $\eqref{eqn:elofd}$

Finally, it also occurred to me that if I replace the $10$ in the chess equations by the number 2.71828 (otherwise denoted by the letter $e$), I get the Fermi-Dirac distribution over energy range $E$ at a given temperature $T$ (see above plots): \begin{align} Ea &= \dfrac{1}{1 + 10^{ \big( \frac{Rb-Ra}{400} \big) }}, & f(E) &= \dfrac{1}{1 + e^{ \big( \frac{E-\mu}{kT} \big) }} \label{eqn:elofd} \end{align} The Fermi-Dirac function, $f(E)$, is key to understanding how electrons "rank" themselves in a semiconductor circuit; a different kind of network, also made famous in the Silicon Valley.


* For example, the scene where the computer science lecturer asks the class an impromptu question. Everyone is stumped. Not too surprising since the question is so hopelessly ill-posed that nobody would be able to provide an answer. Undeterred by such details, the Zuckerberg character superciliously throws off the ("correct") answer as he storms out of the room in a huff. Not only is the technical question ludicrous but, according to the real-life lecturer of that Harvard class, neither that nor any similar event ever occurred!
In a twist of irony while writing this blog post, I recalled that in another but not totally unrelated context, the Bose-Einstein distribution has also been applied to the Hollywood movie industry (PDF).
Here is the R script for producing the above plots.

# Compariosn b/w chess ranking equations from the movie  
# "The Social Network" and the Fermi-Dirac distribution.
# Created by NJG on Monday, January 14, 2013

op <- par(mfrow=c(1,2), pty="s")
# Normalized Chess Ranking profile
Ra <- 1
Rb <- seq(0,2,0.01)
plot(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/800))), type="l", ylim=c(0,1), xlab="Rb/Ra", ylab="Ea",col="red")
lines(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/400))),col="green")
lines(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/1))),col="blue")
title(main="Chess Ranking Dsn")

# Fermi-Dirac distribution
E <- seq(0,2,0.01)
plot(E, 1/(1 + exp(1)^((E-1) * 2)), type="l", ylim=c(0,1), xlab="E", ylab="f(E)",col="red")
lines(E, 1/(1 + exp(1)^((E-1) * 10)),col="green")
lines(E, 1/(1 + exp(1)^((E-1) * 400)),col="blue")
title(main="Fermi-Dirac Dsn")
par(op)

1 comment:

  1. And if it involved the Brownian motion which would be the most likely hypothesis?

    ReplyDelete