SERVING THE QUANTITATIVE FINANCE COMMUNITY

henktijms
Topic Author
Posts: 34
Joined: December 25th, 2010, 2:50 pm

### Success runs with donuts

There are 260 business days in a year. We have 54 employees. Eachemployee is required to bring donuts twice a yearon different days. Each employee chooses the two days at random, independently of the otheremployees. What is the probability that there will be 10 business days in a row where nobodybrings any donuts? I got this brainteaser from reddit.com. Using a quick-and-dirty approach, I arrive at the approximate value 0.9727 for the sought probability. The argument is as follows. The expected number of occurrences of 10 business day in a row without donuts is$\lambda=251 \times\left(\frac{{250 \choose 2}}{{260\choose 2}}\right)^{54}$Then I approximate the probability that there will be no 10 consecutive business days without donuts by $e^{-\lambda}=0.0273$.

wileysw
Posts: 593
Joined: December 9th, 2006, 6:13 pm

### Success runs with donuts

henktijms,thanks for sharing this problem. i don't have an exact solution of the problem as it's stated, but had lots of fun thinking about it. below is my take:(1) your estimate of $\lambda$ looks high, e.g., a 12-day gap would be counted three times in your formula. we need to count it just once. the remedy is to simply pose a requirement that on the day right before the gaps, at least one of the employees brings in a donut. if the gap starts on the first day, there would be no correction; if the gap starts on any other day (before the 252nd day), we would subtract the probability that no one brings in any donut on that particular day (along with the subsequent 10 days) from your expression, i.e.,$\displaystyle\lambda=\frac{251{250\choose 2}^{54}-250{249\choose 2}^{54}}{{260\choose 2}^{54}},$which in turn results 72.1% probability in the Poisson model;(2) one better approximation is to model each day as an "independent" Bernoulli trial with $p={259\choose 2}^{54}/{260\choose 2}^{54}$, the probability that nobody brings any donut on a particular day. we can directly use the result from "run" by calculating the coefficient of the $x^m$ term of the probability generating function$\displaystyle\frac{p^{10}x^{10}(1-px)}{(1-x)\left[1-x+(1-p)p^{10}x^{11}\right]}.$for $m=260$, this results 75.6%;(3) as in most cases, the continuous case is rather easy to solve exactly. we consider the following model that is related to the classic "random arcs covering circle" problem:let $N$ i.i.d. random variables $X_i\sim U(0,1)$ divide the unit interval into $N+1$ segments with the respective length of $Y_0$, ..., $Y_n$. we seek the probability of $\max(Y_i)>\epsilon$, which is equivalent to the probability of these random arcs NOT being able to fully cover the circle. the equivalence can be seen by choosing $N+1$ arcs of size $\epsilon$, identifying each arc by its clockwise end, then fixing one particular end to cut open the circle. by using the inclusion-exclusion principle, the probability can be written down explicitly as$\displaystyle 1-\sum_{i=0}^{\lfloor 1/\epsilon\rfloor}(-1)^i{N+1\choose i}(1-i\epsilon)^N.$going back to the original problem, we would consider the interval $[0.5,260.5)$, with each day occupying $[i-0.5,i+0.5)$. the sum above results 76.7% with substitution of $N=108$ and $\epsilon=10.5/260$.note (a) in this continuous limit, 54 people independently picking two different $X_i$ each is same as 108 people picking one each, as the difference is of zero measure; (b) my only justification of picking 10.5 for $\epsilon$ is that we need a number between 10 and 11 and it's conceivable that by picking up the midpoint, we can eliminates the 1st order correction terms;(4) the main difficulty of the problem comes from the somewhat unusual sampling scheme. by using different and "standard" sampling methods, there are few cases that we can solve analytically. we start with a simple case when 180 days are sampled without replacement, i.e., once someone decides on the dates, they are no longer available for another person to pick. in this case we have a total of ${260\choose 108}$ choices with $c(108,260)$ cases that have no gap lasting at least 10 days, where $c(N,m)$ counts the compositions of $m+1$ with $N+1$ strictly positive integers <=10. these integers represent $N+1$ gaps: $X_1$, $X_2-X_1$, ..., $m+1-X_{N}$ and their sum is $m+1$. one way is to consider the generating function $(x+x^2+...+x^{10})^{N+1}$ or $x^{N+1}(1-x^{10})^{N+1}(1-x)^{-N-1}$. the coefficient of $x^{m+1}$ in the expansion is$\displaystyle c(N,m)=\sum_{i=0}^{\lfloor(m-N)/10\rfloor}(-1)^i{N+1\choose i}{m-10i\choose N}.$in our case, the probability $1-c(108,260)/{260\choose 108}$ is 62%;(5) another case is to remove the constraint that an employee needs to pick two different days, i.e., 108 days are sampled always with replacement. we have a total of $260^{108}$ choices and need to calculate $C(108,260)$, the number of cases that no >=10 day gap exists. the generating function turns out to have a simple form: $z{\rm Li}_{-N}(z)$, where ${\rm Li}_{-N}$ is the polylogarithm, $z/(1-z)=x+x^2+...+x^{10}$ or $z=x(1-x^{10})/(1-x^{11})$, and $C(N,m)$ is the coefficient of the $x^{m+1}$ term.the proof is to consider a specific partition of $N$ samplings into $i$ subsets, i.e., the samplings result $i$ distinctive days. for instance, in (4) we have $i=108$ and each subset contains only 1 sampling. for each $i$, we can similarly write down the generating function as in (4) and sum them up:$\displaystyle\sum_{i=1}^{N}i!S(N,i)(x+x^2+...+x^{10})^{i+1},$where $S(N,i)$ is the Stirling number of the second kind and its generating function is polylogarithm.the sum above gives an alternative form of the coefficient:$\displaystyle C(N,m)=\sum_{j=1}^{N}j!S(N,j)\sum_{i=0}^{\lfloor(m-j)/10\rfloor}(-1)^i{j+1\choose i}{m-10i\choose j},$so the probability $1-C(108,260)/260^{108}$ is 77.1%;(6) back to the original problem, the constraint that each employee needs to pick two different day makes it slightly more difficult to get gaps, so the probability should be less than 77.1% in (5); overall 76.7% in (3) seems to be the best approximation when comparing with MC results.(mathematica is used for evaluating all these sums above)
Last edited by wileysw on May 14th, 2016, 10:00 pm, edited 1 time in total.