Coupon collector's problem (generating function approach)

Coupon collector's problem (generating function approach)

The coupon collector's problem can be solved in several different ways. The generating function approach is a combinatorial technique that allows to obtain precise results.

We introduce the probability generating function (PGF) G(z) where [zq]G(z) is the probability that we take q steps to collect the n coupons i.e. T = q and the expectation is given by

\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}.

We can calculate G(z) explicitly. We have

 G(z) =
\frac{n}{n} z
\frac{1}{1-\frac{1}{n} z}
\frac{n-1}{n} z
\frac{1}{1-\frac{2}{n} z}
\frac{n-2}{n} z
\frac{1}{1-\frac{3}{n} z}
\frac{n-3}{n} z
\cdots
\frac{1}{1-\frac{n-1}{n} z}
\frac{n-(n-1)}{n} z.

To see what this means, note that

 \frac{1}{1 - p z} = 1 + p z + p^2 z^2 + p^3 z^3 + \cdots,

so that this is the PGF of an event that has probability p occurring zero or more times, with the exponent of z counting the number of times. We split the sequence of coupons into segments. A new segment begins every time a new coupon is retrieved for the first time. The PGF is the product of the PGFs of the individual segments. Applying this to G(z), we see that it represents the following sequence of events:

  • retrieve the first coupon (no restrictions at this time)
  • retrieve the first coupon some number of times
  • retrieve the second coupon (probability (n − 1) / n)
  • retrieve a mix of the first and second coupons some number of times
  • retrieve the third coupon (probability (n − 2) / n)
  • retrieve a mix of the first, second, and third coupons some number of times
  • retrieve the fourth coupon (probability (n − 3) / n)
  • \ldots
  • retrieve the last coupon (probability (n − (n − 1)) / n).

In the following, Hn and H_n^{(2)} denote harmonic numbers.

The function G(z) if first simplified before deriving the expectation. First:

 G(z) = z^n
\frac{n-1}{n-z}
\frac{n-2}{n-2z}
\frac{n-3}{n-3z}
\cdots
\frac{n-(n-1)}{n-(n-1)z}
.

THe use is made of the fact that

 \frac{\mathrm{d}}{\mathrm{d}z} \frac{n-k}{n-kz} = \frac{k(n-k)}{(n-kz)^2}

to obtain the derivative of G(z)

\frac{\mathrm{d}}{\mathrm{d}z} G(z) =
G(z)
\left(
\frac{n}{z}
+ \frac{1}{n-z}
+ \frac{2}{n-2z}
+ \frac{3}{n-3z}
\cdots
+ \frac{n-1}{n-(n-1)z}
\right)
.

This yields

 
\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}
= G(1)
\left(
n
+ \frac{1}{n-1}
+ \frac{2}{n-2}
+ \frac{3}{n-3}
\cdots
+ \frac{n-1}{n-(n-1)}
\right)

or

\operatorname{E}(T) = n + \sum_{k=1}^{n-1} \frac{k}{n-k}.

Finally, some simplification:

 \sum_{k=1}^{n-1} \frac{k}{n-k} =
\sum_{k=1}^{n-1} \left( \frac{k}{n-k} - \frac{n}{n-k} \right) + n H_{n-1} =
n H_{n-1} - (n-1)

so that

 \operatorname{E}(T) = n + n H_{n-1} - (n-1) = n H_{n-1} + 1 = n H_n,
\quad \mbox{QED.}

The PGF G(z) makes it possible to obtain an exact value for the variance. Start with

\operatorname{Var}(T) = 
\operatorname{E}(T(T-1)) + \operatorname{E}(T) - \operatorname{E}(T)^2,

which consists entirely of factorial moments that can be calculated from the PGF. We already have the value of \operatorname{E}(T). For the remainder, use

\operatorname{E}(T(T-1)) = 
\left. \left( \frac{\mathrm{d}}{\mathrm{d}z} \right)^2 G(z) \right|_{z=1}.

The derivative is

 
\begin{align}
& G(z) 
\left(
\frac{n}{z}
+ \frac{1}{n-z}
+ \frac{2}{n-2z}
+ \frac{3}{n-3z}
\cdots
+ \frac{n-1}{n-(n-1)z}
\right)^2 \\
+ & \;
G(z) 
\left(
- \frac{n}{z^2}
+ \frac{1^2}{(n-z)^2}
+ \frac{2^2}{(n-2z)^2}
+ \frac{3^2}{(n-3z)^2}
\cdots
+ \frac{(n-1)^2}{(n-(n-1)z)^2}
\right)
\end{align}

Evaluation at z = 1 yields

 
\begin{align} &
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{k^2}{(n-k)^2} =
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{(n-k)^2}{k^2} \\ = & \;
n^2 H_n^2 - n + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + (n-1).
\end{align}

The conclusion is that


\begin{align}
\operatorname{Var}(T) & = \;
n^2 H_n^2 - 1 + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + n H_{n-1} + 1 - n^2 H_n^2 \\ & = \;
n^2 H_{n-1}^{(2)} - n H_{n-1} < \frac{\pi^2}{6} n^2, \quad \text{QED.}
\end{align}

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Coupon collector's problem — In probability theory, the coupon collector s problem describes the collect all coupons and win contests. It asks the following question: Suppose that there are n coupons, from which coupons are being collected with replacement. What is the… …   Wikipedia

  • List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… …   Wikipedia

  • Geometric distribution — Probability distribution two name =Geometric type =mass pdf cdf | parameters =0< p leq 1 success probability (real) support =k in {1,2,3,dots}! pdf =(1 p)^{k 1},p! cdf =1 (1 p)^k! mean =frac{1}{p}! median =leftlceil frac{ log(2)}{log(1 p)} ight… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”