Perhaps the defining factor in a competitive neural network is its transfer function. In the case of networks used to rank sports teams, at the very least, the transfer function will be driven by the results between the pairs of teams. However, the use of the result in the function can take many forms. I will explore them.
The paper that got me started with all this ranking madness (no pun intended) used this function:
VG=A+B * (if W>0,1,0) + [a+b * (if W>0,1,0)]*VO
Where:
VG = value transferred by the connection (the value the team gets from the game)
VO = value of the node connected (value of the opponent team) (since this is an iterative process VO is VG for the opponent in the previous iteration)
W = weight of the connection (score difference of the game)
A + B = intercept for the “win” curve
A = intercept for the “loss” curve
a + b = slope for the “win” curve
a = slope for the “loss” curve
This original function tries to balance out many objectives and needs the parameters (A, B, a, and b) to be tweaked to meet those objectives. In a sense, this function, and the resulting ranking, tries to accomplish too much. It is better to have fewer objectives, or just one when creating a ranking and meeting that single objective well.
The game result, the opponent’s strength, and the transfer function
The transfer function is nothing but the expression of how much value a team gets from a game. The value is drawn from two main sources: how good is the final score and how difficult the opponent is.
Using the opponent’s strength is straightforward and non-controversial and the strength is derived from the transfer function itself.
The transfer function can follow many approaches but basically, it will always reflect the win/loss result (and the tie if applicable) and sometimes will also reflect the spread (the score difference).
Predictive and retrodictive rankings
There are two main types of rankings. Very simply: predictive rankings, that are better at predicting future results, and retrodictive rankings, that are better at explaining what happened in the past.
A predictive ranking has the objective of being a good predictor of future results and historically (or statistically) rankings that use the score difference give better results for this.
A retrodictive ranking has the objective of explaining well what were the best teams in a competition and historically (and statistically again) rankings that do not use the score difference give better results for this.
Here is a good article on predictive versus retrodictive, but focused on baseball: Link.
The “Superlist” by David Wilson at the University of Wisconsin also classifies rankings between predictive and retrodictive.
Transfer functions I use or have used
The one function I don’t use anymore is one I created a long time ago using medians instead of means to calculate the overall team strength. So it wasn’t really a transfer function but an aggregator function. I had the objective of ridding of the outliers but it was computationally complex and didn’t really help get meaningful results.
Currently, in the rankings that I publish for college football, college basketball, international soccer, and Super Rugby, I use these three:
WTA, or Winner Takes All. This is neural network terminology and means I only use the game result (not the score spread). Then the value transferred is either 1 (in a win), 0 (in a tie), or -1 (in a loss). It is more retrodictive. Following the format of the formula above it would look like this:
VG=VO + (if W>0,1(if W=0,0),-1)
MOV, or Margin of Victory. It uses the whole margin of victory (that I have been calling either score difference or score spread helplessly trying to be less confusing). it is more predictive. It would look like this:
VG=VO+W
Finally, in an attempt to get a ranking that tries to achieve multiple contradicting objectives (I know…) I came up with a function that scales back the margin of victory with a logarithmic scale. Of course, I just achieved a ranking that is neither predictive nor retrodictive. The function:
VG=VO+log(W)
Controversies
No ranking makes everyone happy and people tend to take sports very seriously so sports rankings are a thorny business.
Specifically, regarding WTA versus MOV, and even more specifically regarding college football, “analysts” tend to criticize teams that “run up the score” (teams that try to keep scoring harder even though a game is already lopsided and the victory secured. So usually WTA rankings get more respect from the pundits and most rankings in the BCS, that decide the National Championship are, indeed, WTA rankings.
On the other hand, the same “analysts” tend to praise teams that achieve an “impressive victory” which is many times, lopsided… so MOV rankings would be better. For what it’s worth, I believe (and again this is mostly regarding college football which is how I got started) MOV lets teams in a weak conference show off their prowess more clearly than WTA would: in college football teams play around 12 games in a season. Usually, about 8 games are conference games and if these are against weak opponents then the only way for a good team to stand out from their conference peers is to do really well in the few other 4 games they can more freely book or blow out their conference opponents. MOV helps in this particular regard.
Measuring ranking performance
Measuring performance is pretty simple. One can measure how good a ranking is retrodictively (I may have made up this word) or predictively.
To measure retrodictive accuracy you count how many game results the ranking can explain. For each game, the ranking gets a browny point when the winner is ranked higher than the loser and it gets dinged when the winner is ranked lower than the loser. Add up all results and a ratio can be calculated. Since it is very common to see A beat B, B beat C, and also C beat A, many times it is impossible to get a high score here. Good rankings can explain some 80% of results.
To measure the predictive accuracy you count how many game results can be predicted. Usually one can’t use all games here as some rankings (like mine) are not accurate (or even possible) without at least a few games played by each team, to ensure the teams are all well connected as a network. (On the other hand, all games can be used to measure retrodictive accuracy.) Good rankings can predict about 70% of results.
For a compilation of ranking accuracies in college football, Wobus Sports maintains a ranking of rankings by “Wins Explained” and by “Predicted Wins“.
Final thoughts
For the longest time, I have fallen for the trap I described at the beginning: trying to make the ranking do too much, so when I publish my rankings I tend to prioritize the logarithmic transformation of the MOV. This has caused me many problems. I used to publish rankings for college baseball and ice hockey and the rankings were pretty bad-performing (as measured by how different they were from any other ranking and public perception). One thing I’ve learned is that rankings have few if any friends and many aggrieved people that are not very shy to voice their bold opinions. Such is life in the internet era.