9
$\begingroup$

We know that, under the simplifying assumptions that a year has 365 days (i.e., ignoring leap years) and that each day is equally likely to be a person's birthday, the probability that at least two people in a group of $n$ share the same birthday is

$$ 1-\frac{365}{365}\cdot\frac{364}{365}\cdot\frac{363}{365}\cdots\frac{365-(n-1)}{365}. $$

This probability increases nonlinearly with $n$. For example, it exceeds $50$% when $n=23$, is about $81$% when $n=35$, and about $99.4$% when $n=60$.

I have conducted this experiment on at least four different occasions, each involving around 30 participants. Every time, there was at least one pair of people sharing the same birthday.

This makes me wonder whether the assumption that birthdays are uniformly distributed over the 365 days is realistic. More generally, how can one formally test/assess the validity of the equal-likelihood assumption based on repeated observations of birthday matches?

PS: I do not have a formal background in statistics. (I have a math background) So, any rigorous explanation concerning solution of this question is greatly appreciated.

$\endgroup$
9
  • 7
    $\begingroup$ Just as a general comment: Observed birthdays are not uniformely distributed across the year. Interestingly, the uniform case is the worst-case scenario for the birthday problem: Non-uniform birthdays make it more likely that two persons share a birthday. This is a consequence of the Cauchy–Schwarz inequality. $\endgroup$ Commented 2 days ago
  • 1
    $\begingroup$ We know that birthdays are not uniform. Major holidays have far fewer births than other days, for one thing. And there is considerable seasonal variation, as well. See e..g how common is your birthday for the pattern in the US. But that's not exactly what you asked. . $\endgroup$ Commented 2 days ago
  • 2
    $\begingroup$ The LR test is natural and appears relatively easy to compute and evaluate. $\endgroup$ Commented 2 days ago
  • 1
    $\begingroup$ Anecdote: I decided to go rogue at the start of a math class I was giving, and described the birthday paradox to my class of 25ish, who agreed to play along. The very first person I called on stated their birthday—and someone else in the class already shared that birthday! I think those two students ended up dating each other :) $\endgroup$ Commented yesterday
  • 1
    $\begingroup$ Has been discussed (with data) at stats.stackexchange.com/a/336676/11887 $\endgroup$ Commented yesterday

1 Answer 1

13
$\begingroup$

As people mentioned in comments, birthdates are not quite uniform.

However, four out of four observations of a match for each group size near $30$ would not of itself lead me to have any suspicion that the null of uniformity is false: if birthdays were uniform, the probability of at least 4 occasions of at least one matching birthday out of 4 trials for groups of exactly 30 would be $0.7063^4\approx 0.25$, not a very surprising thing to see.

More generally, how can one formally test/assess the validity of the equal-likelihood assumption based on repeated observations of birthday matches?

Sure, this is doable, though of course we have some birthdate information for at least some places and so in practice the question is moot, even if we were to regard those data as a sample. However, let's assume we only know the information from a set of observations more or less like you mention.

In particular, let us assume that all we have recorded for each such occasion of $N$ total occasions ($i=1,2,...,N$) is how many people were present ($n_i$) and whether there were any "birthday matches" ($b_i=1$) or none ($b_i=0$). [If more information was retained, such as how many matches there were or which birthdates were observed, you could use that information; I assume here that it's not available.]

Further I will follow the usual assumption for the birthday problem that you relied on in your opening paragraph - that the people's dates of birth are mutually independent.

I will condition on the group sizes.

If we are broadly operating in a Neyman-Pearson paradigm, constructing a test begins with choosing some test statistic that behaves differently under null (birthdays uniform over the year) and alternative (some birtdates more probable than others, increasing the probability of a match).

For example a naive choice of statistic might simply count the total number of successes (which if $H_1$ is true will tend to be higher than if $H_0$ were true). If there was little variation in group size (e.g. always close to 30), this naive idea would work fairly well.

Now if $H_0$ is true, for each $n_i$ we know (can compute, per the calculation you gave) the probability of at least one match ($P(B_i=1|n_i)$). Call that $p_i$, noting that it is a function of $n_i$. [Under $H_1$, the probability of a match would be some value $q_i>p_i$.]

The naive statistic I mentioned before would then be distributed as Poisson-binomial, with known parameters under the null; we would then select the appropriate fraction of its upper tail to be the critical region.

Fisher might instead just have based a statistic on the likelihood under the null $\mathcal{L}_0 =\prod_{i=1}^N p_i^{b_i} (1-p_i)^{1-b_i}$ (or some transformation of it), and we place the rejection region on small values of the likelihood. This should work quite well against general alternatives.

We might instead pursue a likelihood ratio test. If we stick to our assumption that we don't have any other information, it's a little hard to sensibly estimate $N$ separate probabilities under the alternative just from the data (the $b_i$ and $n_i$). Nevetheless we can make progress; unless I have made an error the likelihood under the alternative is maximized by setting $q_i=1$ when $b_i=1$ and $q_i=p_i$ otherwise and the likelihood ratio $\mathcal{L}_0/\mathcal{L}_1$ would then just be $\prod_{i=1}^N p_i^{b_i}$ (the product of the null probabilities of a match over the instances where a match occurred) and the rejection region would again be on small values of this statistic. An equivalent statistic would be $\sum_{i:b_i=1} -\log(p_i)$ (here the rejection region would encompass largest rather than smallest values). Note that this is equivalent to a weighted version of the naive statistic, where occurrence of success in groups with lower probability of at least one match is awarded relatively more weight.

However, if we imagine a very small preference/ranking effect and that the variation in popularity is a smooth monotonic function of expected probability at each rank, it might even make sense to consider a smooth alternative as a function of a single parameter and optimize for detecting that sort of alternative; e.g. even a small linear deviation from expected proportions under uniformity (sorting by probability) could be considered, though I'd be tempted to look at a linear shift in the logit or the probit of the ordered date probabilities instead, which might make more sense at larger effects. For small effect size, any of these should behave similarly, and should lead to a likelihood ratio test that would have pretty good power against the kind of alternatives that I expect might crop up in practice. While this seems like a promising path to explore, I won't pursue it further for now. [Edit: Indeed, now looking at some data, the idea of taking the expected values of the ordered proportions under the null and linearly tilting them for the alternative looks like an excellent first order approximation to the shape of the ordered observed proportions for that data.]

With a large number of occasions (and not too-extreme probabilities) you might weight each Bernoulli term ($b_i$) by the inverse of its variance under the null ($w_i = [p_i(1-p_i)]^{-1}$) and so obtain a precision-weighted sum $\sum_i w_i b_i = \sum_{i: b_i=1} w_i$ with known expectation and variance, which should be asymptotically normal. This would be another weighted version of the naive statistic. This approach should work pretty well if the sample size is sufficiently large.

For any of the statistics described (or indeed any other reasonable statistic) it's simple enough to simulate behaviour under $H_0$, and given some desired significance level ($\alpha$) obtain a rejection set yielding a test of size no greater than $\alpha$. Equivalently we could compute a simulated p-value to any reasonable desired degree of accuracy. Of course we can in principle compute it exactly but the computations may become cumbersome if the number of groups ("occasions"), $N$, is not small, so it's worth pursuing the discussion in terms of simulation, which works in either case.

Here's an example data set. Imagine we observe 7 occasions, with groups of size 15, 18, 26, 29, 30, 32, 35, and the only occasion without any match (the only failure to find a match) was for the group of 18.

The respective success probabilities are (rounded to 4 figures): 0.2529, 0.3469, 0.5982, 0.6810, 0.7063, 0.7533, 0.8144.

Using $T=-\sum_{i:b_i=1} \log(p_i)$ as an example statistic, I simulated 100,000 samples of the vector of observed b's (under $H_0$, a vector of independent Bernoulli variates each with its success probability given above) and computed the statistic for each. The statistic is, of course, discrete.

Simulated distribution of 100,000 values from the null distribution of T. Values range from 0 to about 4.167. The distribution consists of 128 point masses of varying heights. The proportions do not progress smoothly in height nor are they equally spaced.

The apparent roughness of distribution is inherent, not a matter of sampling error from the simulation, which is quite small. There are $2^7=128$ possible values of the test statistic; all occurred many times in our simulation. It is of course simple enough to do the exact calculation numerically for this example (and I did so, the simulated distribution is quite close to the exact one), but more generally simulation will be feasible when exact computation is not.

The test statistic for our data is 3.109002; it is marked with a short line segment in dark red on the lower margin of the plot.

The proportion of simulated test statistics at least as large as our sample statistic was 0.1084, so the estimated p-value is 10.84% (from complete enumeration, the exact p-value is 10.76%). This would not be small enough to conclude non-uniformity of birthdays at typical significance levels.

$\endgroup$
1
  • $\begingroup$ Thank you for your answer. I'll take some time to assimilate it and reflect on it. $\endgroup$ Commented 11 hours ago

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.