Ok, I was supposed to take a break, but Frédéric, professor in Tours, came back to me this morning with a tickling question. He asked me what were the odds that the Champions League draw produces exactly the same pairings from the practice draw, and the official one (see e.g. dailymail.co.uk/…).

To be honest, I don’t know much about soccer, so here is what happened, with the practice draw (on the left, on December 19th) and the official one (on the right, on December 20th),

Clearly, the pairs are identical, but not the order. Actually, at first, I was suprised that even which team plays at home first, was iddentical. But (it seams that) teams that play at home first are the ones that ended second after the previous stage of the competition.

And to be more specific about those draws, those pairs were obtained using real urns, real balls, so it is pure randomness (again, as far as I understood). But with very specific rules. For instance, two teams from the same country cannot play together (or one against the other) at this stage. Or teams that ended first after the previous turn can only play with (or against) teams that ended second. Actually, Frederic sent me an xls file, with a possibility matrix.

Let us find all possible pairs, regardless which team plays at home first (again, we do not care here since the order is defined by the rule mentioned above). Doing the maths might have been a bit complicated, with all those contraints. With a small code, it is possible to list all possible pairs, for those eight games. Let us import our possibility matrix,

> n=16 > uefa=read.table( + "http://freakonometrics.blog.free.fr/public/data/uefa.csv", + sep=",",header=TRUE) > LISTEIMPOSSIBLE=matrix( + (rep(1:n,n))*(uefa[1:n,2:(n+1)]=="NON"),n,n)

I can fix the first team (in my list, the fourth one is the first team that ended second). Then, I look at all possible second one (that will play with the first one),

> a1=1 > "%notin%" <- function(x, table){x[match(x, table, nomatch = 0) == 0]} > posa2=((a1+1):n)%notin%LISTEIMPOSSIBLE[,a1]

Then, consider the second team that ended second (the sixth one in my list). And look at all possible fourth team (that will play this second game), i.e exluding the one that were already drawn, and those that are not possible,

> b1=6 > posb2=(1:n)%notin%c(LISTEIMPOSSIBLE[,b1],a2)

Etc. So, given the list of home teams,

> a1=4 > b1=6 > c1=8 > d1=9 > e1=12 > f1=14 > g1=15 > h1=16

consider the following loops,

> posa2=(1:n)%notin%c(LISTEIMPOSSIBLE[,a1]) > for(a2 in posa2){ + posb2=(1:n)%notin%c(LISTEIMPOSSIBLE[,b1],a2) + for(b2 in posb2){ + posc2=(1:n)%notin%c(LISTEIMPOSSIBLE[,c1],a2,b2) + for(c2 in posc2){ + posd2=(1:n)%notin%c(LISTEIMPOSSIBLE[,d1],a2,b2,c2) + for(d2 in posd2){ + pose2=(1:n)%notin%c(LISTEIMPOSSIBLE[,e1],a2,b2,c2,d2) + for(e2 in pose2){ + posf2=(1:n)%notin%c(LISTEIMPOSSIBLE[,f1],a2,b2,c2,d2,e2) + for(f2 in posf2){ + posg2=(1:n)%notin%c(LISTEIMPOSSIBLE[,g1],a2,b2,c2,d2,e2,f2) + for(g2 in posg2){ + posh2=(1:n)%notin%c(LISTEIMPOSSIBLE[,h1],a2,b2,c2,d2,e2,f2,g2) + for(h2 in posh2){ + s=s+1 + V=c(a1,a2,b1,b2,c1,c2,d1,d2,e1,e2,f1,f2,g1,g2,h1,h2) + cat(s,V,"\n") + M=rbind(M,V) + }}}}}}}}

With the print option, we end up with

5461 4 13 6 11 8 5 9 2 12 10 14 3 15 7 16 1 5462 4 13 6 11 8 5 9 2 12 10 14 7 15 1 16 3 5463 4 13 6 11 8 5 9 2 12 10 14 7 15 3 16 1

i.e.

> nrow(M) [1] 5463

possible pairs (the list can be found here, where numbers are the same as the one in the csv file). Which was the probability mentioned in acomment in the article mentioned previously dailymail.co.uk/…. So the probability to have exactly the same output after the practise and the official draws was (in %)

> 100/nrow(M) [1] 0.01830496

Which is not *that* small when we think about it….

And if someone has a mathematical expression for this probability, I am interested. The only reliable method I found was to list all possible pairs (the csv file is available if someone wants to check). But I am not satisfied….

I did simple direct monte carlo simulations on this before the draws.

Here is my post dated 7.12.2012:

http://memosisland.blogspot.de/2012/12/uefa-champions-league-knockout-phase.html

This is a direct simulation that repeatedly creates possible pairs randomly, I run this 20M times.

From my understanding this result assumes all pairings are equally likely (?), which is not really obvious to me…i dont think its true actually and in that case i wonder how far from uniform distribution we could get (depending on possibility matrix)

Interesting. I wrote a computational solution to this problem a few days ago. However, the probability I calculated was much lower: 0.00011. I am wondering why our answers are so different. My calculation was performed via MC methods wherein I assumed that ordering was unimportant. The advantage of the MC method (in my head anyway) is that all I needed to do was write a function that performs random draws while adhering to UEFA’s rules without worrying about complex condition probabilities. Perhaps there is an error in my code but if there is, it’s not one I can see. My post is here:

http://diffuseprior.wordpress.com/2012/12/24/identical-champions-league-draw-what-were-the-odds

We have almost the same probability, I just use a percentage notation…

I’ve read your uefa blog post. You ask for a mathematical solution for the counting all possible matchings under constraints. Actually there is a simple solution to that.

If you create an 8*8 matrix with teams who finished 1st as rows an those who finished second as columns, then Aij=1 means that team i can face team j, and Aij=0 means they are constrained an cannot face each other.

now, the number of possible legit draws (the number of options you simulated) is exactly the permanent of that matrix.

more generally, permanents can be used to count maximal matchings in bipartite graphs.

A precision taht might be useful to some:

the %notin% function is not a standard one, it needs to be defined as:

`%notin%` <- function(x,y) !(x %in% y)

(seen here: http://stackoverflow.com/questions/7494848/standard-way-to-remove-multiple-elements-from-a-dataframe)

Thanks for your very nice blog anyway!