The birthday problem, and a generalization

It’s well known that in a group of 23 people, the odds are good that at least two of them will share the same birthday. This initially counter-intuitive result is easily explained by noting that the probability of n people having different birthdays is

    \[ 1\times \frac{364}{365}\times\frac{363}{365}\times\cdots\times\frac{365-n+1}{365} \]

assuming that there are only 365 days in a year, and that each birthday is equally likely. This expression can be more concisely written as

    \[ \frac{P(365,n)}{365^n}. \]

The probability of at least two birthdays being the same is then

    \[ 1-\frac{P(365,n)}{365^n}. \]

For n=23 this value is slightly greater than 0.5:

sage: def perm(n,k): return binomial(n,k)*factorial(k)
sage: (1-perm(365,23)/365^23).n()

This fact that birthday “collisions” are surprisingly likely for a relatively small group, is the foundation for much of the theory of cryptographic hash functions, where security is related to the difficulty of finding collisions: two different inputs with the same output.

But recently I discovered that at least three people known to me had the same birthday. What were the odds of this, I wondered? In general, how many people do you need to have better than even odds for at least three people to share a birthday? Being generally pretty poor at probability (at school I answered a probability question in a local mathematics magazine, and had my answer printed, only to discover later that it was in fact wrong), I did a search. The best answer, with a nice explanation, is here.

Here’s how this author, Dr Rick, solved it. As for the simpler birthday problem above, first find the probability that any birthday is shared by at most two people. We can build this answer up in stages. To start, suppose there are n people, what is the probability that there are exactly four pairs who share (different) birthdays?

Start with assuming that the first two people have the same birthday, and the second pair has the same birthday (which is different to the birthday of the first pair), and then the third pair share a birthday (again which is different to the first two pair birthdays) and similarly for the fourth pair. This probability can be computed by choosing any four birthdays for the four pairs, and ensuring that all the other n-8 people have different birthdays. Start by noting that there are

    \[ \binom{365}{4} \]

ways of choosing four different birthdays (one for each pair) and

    \[ \binom{n}{2} \]

ways of choosing the first pair. This leaves

    \[ \binom{n-2}{2} \]

ways of choosing the second pair. And the number of ways of choosing the third and fourth pairs will be

    \[ \binom{n-4}{2} \]


    \[ \binom{n-6}{2} \]

respectively. Then we require that the other n-8 people all have different birthdays chosen from the remaining 361 possible days. Putting all this together gives us the total probability of exactly fours pairs of people, each pair of which share a birthday:

    \[ \binom{365}{4}\left(\frac{1}{365}\right)^8\binom{n}{2}\binom{n-2}{2}\binom{n-4}{2}\binom{n-6}{2}\frac{P(361,n-8)}{365^{n-8}} \]

where the last expression is of course the probability of n-8 people having different birthdays from a total of 361 possible days.

This expression can be simplified. First note that

    \begin{align*} \binom{n}{2}\binom{n-2}{2}\binom{n-4}{2}\binom{n-6}{2}\\ &=\frac{n!}{2!(n-2)!}\frac{(n-2)!}{2(n-4)!}\frac{(n-4)!}{2(n-6)!}\frac{(n-6)!}{2(n-8)!}\\ &=\frac{n!}{2^4(n-8)!}. \end{align*}

Now the above expression can be written as

    \[ \frac{\binom{365}{4}P(365-4,n-8)}{365^n}\frac{n!}{2^4(n-8)!}. \]

Now we can also express the permutation and binomial coefficient as factorials:

    \[ \frac{365!\,361!\,n!}{4!\,361!\,(365-n+4)!2^4(n-8)!}=\frac{365!\,n!}{4!(365-n+4)!2^4(n-8)!} \]

This means that the probability of exactly i pairs of equal birthdays is

    \[ \frac{365!\,n!}{i!(365-n+i)!2^i(n-2i)!}, \]

and so the total probability of pairs of birthdays (but no triples) is

    \[ \sum_{i=0}^{\lfloor n/2\rfloor}\frac{365!\,n!}{i!(365-n+i)!2^i(n-2i)!}. \]

Thus the probability of at least one trio of birthdays is

    \[ 1-\sum_{i=0}^{\lfloor n/2\rfloor}\frac{365!\,n!}{i!(365-n+i)!2^i(n-2i)!}. \]

This is the same formula which is given (without derivation) in Anirban DasGupta, The matching, birthday and the strong birthday problem: a contemporary review, Journal of Statistical Planning and Inference 130 (2005), 377-389. (For this reference, I am indebted to Michael Lugo on Mathematics StackExchange).

Now for some numerical experiments:

sage: def trio(n): return (1-sum((factorial(365)*factorial(n))/
....: (factorial(i)*factorial(n-2*i)*factorial(365-n+i)*2^i*365^n)
....: for i in range(n//2+1))).n()
sage: trio(87)
sage: trio(88)

and we see we need a group of 88 people to have better than even odds that three of them share a birthday.

5 thoughts on “The birthday problem, and a generalization

  1. In the stage that you say, “Now we can also express the permutation and binomial coefficient as factorials” what happened to the 365^n of the previous line?

    1. They didn’t display properly at all! When I migrated from to this self-hosted server, all the backslashes in my LaTeX formulas were removed (I don’t know why), so that, for example: \sum\frac{\pi}{\pi+n} came across as sumfrac{pi}{pi+n}. I’m slowly working my way through old posts and tidying them up. So thanks for letting me know!

Leave a Reply

Your email address will not be published. Required fields are marked *