贈券收集問題

维基百科,自由的百科全书
跳转至: 导航搜索

贈券收集問題機率論中的著名題目,其目的在解答以下問題:

假設有n種贈券,每種贈券獲取機率相同,而且贈券亦無限供應。若取贈券t張,能集齊n種贈券的機率多少?

計算得出,能集齊n種贈券的收集量t期望值\Theta(n\log(n))關係,例如n = 50時大約要取225次才能集齊50種贈券。

問題內容[编辑]

贈券收集問題的特徵是開始收集時,可以在短時間內收集多種不同的贈券,但最後數種則要花很長時間才能集齊。例如有50種贈券,在集齊49種以後要約多50次收集才能找到最後一張,所以贈券收集問題的答案t的期望值要比50要大得多。

解答[编辑]

計算期望值[编辑]

假設T是收集所有n種贈券的時間,時間t_i是在收集了第i-1種贈券以後,到收集到第i種贈券所花的時間,那麼Tt_i都是隨機變量。在收集到i-1種贈券後能再找到「新」一種贈券的概率是p_i = \frac{n - i + 1}{n},所以t_i是一種幾何分佈,並有期望值\frac{1}{p_i}。根據期望值的線性性質,


\begin{align}
\operatorname{E}(T)& = \operatorname{E}(t_1) + \operatorname{E}(t_2) + \cdots + \operatorname{E}(t_n) 
= \frac{1}{p_1} + \frac{1}{p_2} +  \cdots + \frac{1}{p_n} \\
& = \frac{n}{n} + \frac{n}{n-1} +  \cdots + \frac{n}{1}  = n \cdot \left(\frac{1}{1} + \frac{1}{2} + \cdots + \frac{1}{n}\right) \, = \, n \cdot H_n.
\end{align}

其中H_n調和數,根據其近似值,可化約為:


\operatorname{E}(T)  = n \cdot H_n = n \ln n + \gamma n + \frac1{2} + o(1), \ \ 
\text{as}  \ n \to \infty,

其中\gamma \approx 0.5772156649歐拉-馬歇羅尼常數.

那麼,可用馬可夫不等式求取機率的上限:

\operatorname{P}(T \geq c \, n H_n) \le \frac1c.

變異[编辑]

基於t_i互相獨立的特性,則有:


\begin{align}
\operatorname{Var}(T)& = \operatorname{Var}(t_1) + \operatorname{Var}(t_2) + \cdots + \operatorname{Var}(t_n) \\ 
& = \frac{1-p_1}{p_1^2} + \frac{1-p_2}{p_2^2} +  \cdots + \frac{1-p_n}{p_n^2} \\
& \leq \frac{n^2}{n^2} + \frac{n^2}{(n-1)^2} +  \cdots + \frac{n^2}{1^2} \\
& \leq n^2 \cdot \left(\frac{1}{1^2} + \frac{1}{2^2} + \cdots \right) = \frac{\pi^2}{6} n^2 \le 2 n^2,
\end{align}

最末一行的等式來自黎曼ζ函數巴塞爾問題。此式繼而可用切比雪夫不等式求取機率上限:

\operatorname{P}\left(|T- n H_n| \geq c\, n\right) \le \frac{2}{c^2}.

尾部估算[编辑]

我們亦可用以下方法求另一個的上限:假設{Z}_i^r表示在首r次收集中未有見到第i種贈券,則


\begin{align}
P\left [ {Z}_i^r \right ] = \left(1-\frac{1}{n}\right)^r \le e^{-r / n}
\end{align}

所以,若r = \beta n \log n,則有P\left [ {Z}_i^r \right ] \le e^{(-\beta n \log n ) / n} = n^{-\beta}.


\begin{align}
P\left [ T > \beta n \log n \right ] \le P \left [ 	\bigcup_i {Z}_i^{\beta n \log n} \right ] \le n \cdot P [ {Z}_1 ] \le n^{-\beta + 1}
\end{align}

用生成函數的解法[编辑]

另一種解決贈券收集問題的方法是用生成函數

觀察得出,贈券收集的過程必然如下:

  • 收集第一張贈券,其出現的機率是n/n=1
  • 收集了若干張第一種贈券
  • 收集到一張第二種贈券,其出現的機率是(n-1)/n
  • 收集了若干張第一種或第二種贈券
  • 收集到一張第三種贈券,其出現的機率是(n-2)/n
  • 收集了若干張第一種、第二種或第三種贈券
  • 收集到一張第四種贈券,其出現的機率是(n-3)/n
  • \ldots
  • 收集到一張最後一種贈券,其出現的機率是1/n

若某一刻已若干種贈券,再收集到一張已重覆的贈券的機率是p,那麼,再收集到m張已重覆的贈券的機率就是p^m。則就此部分而言,有關m機率生成函數(PGF)是

G(z)=\sum_{m=0}^\infty p^mz^m = 1 + p z + p^2 z^2 + p^3 z^3 + \cdots = \frac{1}{1 - p z}

若將上述收集過程分割為多個階段,則整個收集過程所花的時間的機率生成函數為各部分的乘積,亦即

 G(z) =
\frac{n}{n} z \cdot
\frac{1}{1-\frac{1}{n} z} \cdot
\frac{n-1}{n} z \cdot
\frac{1}{1-\frac{2}{n} z} \cdot
\frac{n-2}{n} z \cdot
\frac{1}{1-\frac{3}{n} z} \cdot
\frac{n-3}{n} z
\cdots
\frac{1}{1-\frac{n-1}{n} z} \cdot
\frac{n-(n-1)}{n} z.

那麼,根據機率生成函數的特性,總收集次數T期望值

\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}

而某一T的機率則是

\Pr(T=k) = \left.\frac{1}{k!}\frac{\mathrm{d}^kG(z)}{\mathrm{d}z^k}\right|_{z=0}

計算E(T)可先化簡G(z)

 G(z) = z^n   \frac{n-1}{n-z}   \frac{n-2}{n-2z}   \frac{n-3}{n-3z}   \cdots   \frac{n-(n-1)}{n-(n-1)z}

因為

 \frac{\mathrm{d}}{\mathrm{d}z} \frac{n-k}{n-kz} = \frac{k(n-k)}{(n-kz)^2}

所以

\frac{\mathrm{d}}{\mathrm{d}z} G(z) =
G(z) \left(
    \frac{n}{z} + \frac{1}{n-z} + \frac{2}{n-2z} + \frac{3}{n-3z} \cdots + \frac{n-1}{n-(n-1)z}
\right)

故此可得出

\begin{align}
\operatorname{E}(T) &= \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}\\
&= G(1) \left( n + \frac{1}{n-1} + \frac{2}{n-2} + \frac{3}{n-3} \cdots + \frac{n-1}{n-(n-1)} \right) \\
&= n + \sum_{k=1}^{n-1} \frac{k}{n-k}
\end{align}

其中的連加部分可化簡:

 \sum_{k=1}^{n-1} \frac{k}{n-k} =
\sum_{k=1}^{n-1} \left( \frac{k}{n-k} - \frac{n}{n-k} \right) + n H_{n-1} =
n H_{n-1} - (n-1)

所以得出:  \operatorname{E}(T) = n + n H_{n-1} - (n-1) = n H_{n-1} + 1 = n H_n

用機率生成函數可同時求取變異量。變異量可寫作

\operatorname{Var}(T) = 
\operatorname{E}(T(T-1)) + \operatorname{E}(T) - \operatorname{E}(T)^2

其中

\begin{align}
\operatorname{E}(T(T-1))
=& \left. \frac{\mathrm{d}^2}{\mathrm{d}z^2} G(z) \right|_{z=1}\\
=& \left[
  G(z) \left( \frac{n}{z}  +  \frac{1}{n-z}  +  \frac{2}{n-2z}  + \frac{3}{n-3z}  \cdots  +  \frac{n-1}{n-(n-1)z}  \right)^2 \right. \\
& \;\left.\left. + G(z)  \left( - \frac{n}{z^2} + \frac{1^2}{(n-z)^2} + \frac{2^2}{(n-2z)^2} + \frac{3^2}{(n-3z)^2} \cdots + \frac{(n-1)^2}{(n-(n-1)z)^2} \right)
\right]\right|_{z=1}\\
=&
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{k^2}{(n-k)^2} \\
=&
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{(n-k)^2}{k^2} \\
=&
n^2 H_n^2 - n + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + (n-1).
\end{align}

故得出:


\begin{align}
\operatorname{Var}(T) & = \;
n^2 H_n^2 - 1 + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + n H_{n-1} + 1 - n^2 H_n^2 \\ & = \;
n^2 H_{n-1}^{(2)} - n H_{n-1} < \frac{\pi^2}{6} n^2
\end{align}

參考文獻[编辑]

  • Paul Erdős and Alfréd Rényi, On a classical problem of probability theory, Magyar Tud. Akad. Mat. Kutato Int. Kozl, 1961.
  • William Feller, An introduction to Probability Theory and its Applications, 1957.
  • Michael Mitzenmacher and Eli Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005
  • Donald J. Newman and Lawrence Shepp, The Double Dixie Cup Problem, American Mathematical Monthly, Vol. 67, No. 1 (Jan., 1960), pp. 58–61.
  • Philippe Flajolet, Danièle Gardy, Loÿs Thimonier Birthday paradox, coupon collectors, caching algorithms and self-organizing search., Discrete Applied Mathematics, Vol. 39, (1992), pp. 207–229

外部鏈結[编辑]