It’s well known that in a group of 23 people, the odds are good that at least two of them will share the same birthday. This initially counter-intuitive result is easily explained by noting that the probability of people having *different* birthdays is

assuming that there are only 365 days in a year, and that each birthday is equally likely. This expression can be more concisely written as

The probability of at least two birthdays being the same is then

For this value is slightly greater than 0.5:

sage: def perm(n,k): return binomial(n,k)*factorial(k) ....: sage: (1-perm(365,23)/365^23).n() 0.507297234323985

This fact that birthday “collisions” are surprisingly likely for a relatively small group, is the foundation for much of the theory of cryptographic hash functions, where security is related to the difficulty of finding collisions: two different inputs with the same output.

But recently I discovered that at least three people known to me had the same birthday. What were the odds of this, I wondered? In general, how many people do you need to have better than even odds for at least *three* people to share a birthday? Being generally pretty poor at probability (at school I answered a probability question in a local mathematics magazine, and had my answer printed, only to discover later that it was in fact wrong), I did a search. The best answer, with a nice explanation, is here.

Here’s how this author, Dr Rick, solved it. As for the simpler birthday problem above, first find the probability that any birthday is shared by at most two people. We can build this answer up in stages. To start, suppose there are people, what is the probability that there are exactly four pairs who share (different) birthdays?

Start with assuming that the first two people have the same birthday, and the second pair has the same birthday (which is different to the birthday of the first pair), and then the third pair share a birthday (again which is different to the first two pair birthdays) and similarly for the fourth pair. This probability can be computed by choosing any four birthdays for the four pairs, and ensuring that all the other people have different birthdays. Start by noting that there are

ways of choosing four different birthdays (one for each pair) and

ways of choosing the first pair. This leaves

ways of choosing the second pair. And the number of ways of choosing the third and fourth pairs will be

and

respectively. Then we require that the other people all have different birthdays chosen from the remaining 361 possible days. Putting all this together gives us the total probability of exactly fours pairs of people, each pair of which share a birthday:

where the last expression is of course the probability of people having different birthdays from a total of possible days.

This expression can be simplified. First note that

Now the above expression can be written as

Now we can also express the permutation and binomial coefficient as factorials:

This means that the probability of exactly pairs of equal birthdays is

and so the total probability of pairs of birthdays (but no triples) is

Thus the probability of at least one trio of birthdays is

This is the same formula which is given (without derivation) in Anirban DasGupta, The matching, birthday and the strong birthday problem: a contemporary review, Journal of Statistical Planning and Inference 130 (2005), 377-389. (For this reference, I am indebted to Michael Lugo on Mathematics StackExchange).

Now for some numerical experiments:

sage: def trio(n): return (1-sum((factorial(365)*factorial(n))/ ....: (factorial(i)*factorial(n-2*i)*factorial(365-n+i)*2^i*365^n) ....: for i in range(n//2+1))).n() ....: sage: trio(87) 0.499454850631401 sage: trio(88) 0.511065110624730

and we see we need a group of 88 people to have better than even odds that three of them share a birthday.

You have a typo in the denominator of your final equation.

In the stage that you say, “Now we can also express the permutation and binomial coefficient as factorials” what happened to the 365^n of the previous line?

Is it just me, or the equations here don’t show up as expected?

They didn’t display properly at all! When I migrated from wordpress.com to this self-hosted wordpress.org server, all the backslashes in my LaTeX formulas were removed (I don’t know why), so that, for example: \sum\frac{\pi}{\pi+n} came across as sumfrac{pi}{pi+n}. I’m slowly working my way through old posts and tidying them up. So thanks for letting me know!