Beating the Crossy Roads Prize Machine

How many coins does it take to collect all the characters in Crossy Roads from the Prize Machine?


While I was on my 5 hour flight from Virginia to the West Coast, I forgot my headphones and was too frugal to pay for airplane Wi-Fi. So, the only thing I could do on my phone was play Crossy Roads. For the past few months, I had been pretty consistent in collecting the free coins that the game gives you every day and had built up an impressive total of 94000 coins. I thought this airplane flight would be the perfect chance to spend all those coins on the Prize Machine to unlock some new characters to play with.

For those of you that don’t know how the Prize Machine works, you can spend 100 coins on the Prize Machine, and every purchase will yield a character. It’s important to note that it is possible to receive a character that has already been obtained (a duplicate character). In other words, the Prize Machine is like a random draw with replacement. Naturally, the chances of receiving a duplicate mascot increases as you obtain more mascots.

Well, with 5 hours to kill, I just went to the Prize Machine and blew all my coins to try and collect all the characters. I had roughly 60 characters in my collection already and by the end of this process, I had collected 255 with 4000 coins remaining. The only issue was that there were 355 total characters: I was still missing 100.

I thought 94000 coins was a lot, but apparently it wasn’t enough to get all the characters. I started to wonder what was the expected number of coins needed to collect all the characters.

For reference, there are 355 total characters to unlock, but the chicken is given to you at the start of the game and 2 more characters are for in-app purchase only. Furthermore, according to https://crossyroad.fandom.com/wiki/Prize_Machine there are 21 other special characters that cannot be obtained via the Prize Machine. Hence, we need to figure out the expected number of purchases we need to make in order to collect 355-1-2-21=331 characters. We can just multiply the number of purchases we need to make by 100 to figure out how many coins we will need.

We will solve the general case of collecting n characters. Let N represent the discrete random variable of the number of purchases until each of the
n objects are selected at least once, and let N_k represent the discrete random variable of the number of purchases required to collect the kth character after k-1 unique characters have been collected. We have N_1=1 as a base case and

    \[N=N_1+N_2+\cdots+N_n.\]

Let P_k represent the probability of getting the kth character. To start, imagine if we haven’t ever used the Prize Machine before: then we clearly have P_1=\frac{n}{n}. But after this purchase, we have reduced our odds for getting a new character: it is now P_2=\frac{n-1}{n}. More generally, after we have collected the k-1th character, the probability of collecting the kth character with our next purchase is going to be

    \[P_k=\frac{n-(k-1)}{n}=\frac{n-k+1}{n}.\]

Recall that N_k represents the discrete random variable of the number of purchases required to collect the kth character. The key observation is that each N_k is actually a Bernoulli trial and thus follows a Geometric distribution with probability mass function

    \[N_k \sim \text{Geom}(P_k) \Longrightarrow P(N_k=k)=(1-P_k)^{k-1}P_k.\]

The notation \sim reads “follows the distribution” and the notation P(N_k=k) is the probability that the first success occurs on the k-th trial. So N_k\sim\text{Geom}(P_k) means that N_k is a geometrically distributed random variable with success probability P_k and the probability that the first success occurs on the k-th trial is given by the Probability Mass Function (PMF) of the geometric distribution: P(N_k=k)=(1-P-k)^{k-1}P_k.

This is really convenient for us because of the following two facts:

\textbf{Expected Value.} Given a Geometric distribution X \sim \text{Geom}(p) with probability mass function P(X=k)=(1-p)^{k-1}p, the expected value is given by E(X)=\frac{1}{p}.

\textit{Proof.} We have

    \[E(X)=\sum_{k=1}^{\infty} k\cdot P (N=k)= \sum_{k=1}^{\infty} k\cdot (1-p)^{k-1}p= p\sum_{k=1}^{\infty} k\cdot (1-p)^{k-1}.\]

Note that \frac{d}{dp} (1-p)^k=k\cdot (1-p)^{k-1} and p\sum_{k=1}^{\infty}r^k=\frac{1}{1-r} for r<|1|, so we have

    \[E(X)=\sum_{k=1}^{\infty} k\cdot (1-p)^{k-1}=p\sum_{k=1}^{\infty} \frac{d}{dp} (-(1-p)^k)=p\cdot\frac{d}{dp}\left(- \sum_{k=1}^{\infty}(1-p)^k\right)=p\cdot\frac{d}{dp}\left(\frac{-1}{1-(1-p)}\right)=p\cdot\frac{1}{p^2}=\frac{1}{p}\]

as desired. \square

\textbf{Variance.} Given a Geometric distribution X \sim \text{Geom}(p) with probability mass function P(X=k)=(1-p)^{k-1}p, the variance is given by V(X)=\frac{1-p}{p^2}.

\textit{Proof.} I won’t provide a proof, but this is a standard result that is taught in many undergraduate probability and statistics courses (MATH 451 at William and Mary). \square

Returning to the problem, since we have

    \[N_k \sim \text{Geom}(P_k) \Longrightarrow P(N_k=k)=(1-P_k)^{k-1}P_k,\]

we know E(N_k)=\frac{1}{P_k}. By linearity of expectation,

    \begin{align*}E(N)&=E(N_1+N_2+\dots+N_n)\\&=E(N_1)+E(N_2)+\dots+E(N_n)\\&=\frac{1}{P_1}+\frac{1}{P_2}+\dots+\frac{1}{P_n}\\&=\frac{1}{\frac{n-1+1}{n}}+\frac{1}{\frac{n-2+1}{n}}+\dots+\frac{1}{\frac{n-n+1}{n}}\\&=n\left(1+\frac{1}{2}+\frac{1}{3}\dots+\frac{1}{n}\right)\\&=nH_n\end{align*}

where H_n is the nth Harmonic Number. We can approximate the nth Harmonic Number with H_n=\sum_{i=1}^n \frac{1}{n}\approx \int_{i=1}^n\frac{1}{n}=\ln (n) but it can be more accurately estimated as

    \[H_n\approx \ln (n)+\gamma+\frac{1}{2n}\]

where \gamma \approx 0.577216 which is called the Euler-Mascheroni constant. So

    \[E(N)=n(\ln (n)+\gamma)+\frac{1}{2}.\]

The varience of N can be found in a similar way. Since we have

    \[N_k \sim \text{Geom}(P_k) \Longrightarrow P(N_k=k)=(1-P_k)^{k-1}P_k,\]

we know

    \[V(N_k)=\frac{1-P_k}{P_k^2}=\frac{1-\frac{n-k+1}{n}}{\left(\frac{n-k+1}{n}\right)^2}=\frac{\frac{k-1}{n}}{\left(\frac{n-k+1}{n}\right)^2}=\frac{n(k-1)}{(n-k+1)^2}.\]

Now,

    \begin{align*}V(N)&=V(N_1+N_2+\dots+N_n)\\&=V(N_1)+V(N_2)+\dots+V(N_n)\\&=\frac{n(1-1)}{(n-1+1)^2}+\frac{n(2-1)}{(n-2+1)^2}+\dots+\frac{n(n-1)}{(n-n+1)^2}\\&=n\left(\frac{0}{n^2}+\frac{1}{(n-1)^2}+\frac{2}{(n-2)^2)}+\dots+\frac{n-1}{1}\right)\end{align*}

but note that

    \[0+1+2+\dots+n-1=\frac{(n-1)(n)}{2}<\frac{(n)(n+1)}{2}=n+n+n+\dots+n\]

so we have

    \begin{align*}V(N)&=n\left(\frac{0}{n^2}+\frac{1}{(n-1)^2}+\frac{2}{(n-2)^2)}+\dots+\frac{n-1}{1}\right)\\&<n\left(\frac{n}{n^2}+\frac{n}{(n-1)^2}+\frac{n}{(n-2)^2)}+\dots+\frac{n}{1}\right)\\&=n^2\left(\frac{1}{n^2}+\frac{1}{(n-1)^2}+\frac{1}{(n-2)^2)}+\dots+\frac{1}{1}\right)\\&=n^2\left(\frac{\pi^2}{6}\right)\end{align*}

by the famous Basel problem. So V(N)\approx \frac{n^2\pi^2}{6}.

Now that we got all the math out of the way, we can answer the original question: what is the expected number of coins needed to collect all the Crossy Roads characters from the Prize Machine? Since the Prize Machine can output 331 different characters per use, we can expect to use it

    \[E(N_{331})=331\cdot H_{331}=331(\ln (331)+\gamma)+\frac{1}{2}\approx 2112.06\]

times, so we will need 100\cdot 2112.06=211206 coins. So I was off by over 100000 coins. As a check, here is a quick Python Monte Carlo simulation to verify this:

import numpy as np

def monte_carlo_prize_machine(characters, num_simulations):
    results = []
    for i in range(num_simulations):
        collected_char = set()  # empty set to keep track of collected characters
        count = 0
        while len(collected_char) < characters:
            character = np.random.randint(characters)
            collected_char.add(character) # modifies set by adding the specified element if not inside - does nothing if alr in set
            count += 1
        results.append(count)
    mean_result = np.mean(results)
    print(f"Expected value of purchases needed: {mean_result}")

monte_carlo_prize_machine(331, 1000)
# Expected value of purchases needed: 2134.584

The variance in this value would be

    \[V(N_{331})\approx \frac{331^2\pi^2}{6}\approx 180220.62\]

uses, or 18022062 coins. Here is a some Python code to verify the values of the EV and Var.

import math

def purchases_EV(n):
    Hn = 0
    for i in range(1, n+1):
        Hn += 1/i
    return n*Hn

def purchases_EV_approx(n):
    return n*math.log(n) + n*0.577215665 + 0.5

def purchases_var(n):
    return ((n*math.pi)**2)/6

print(purchases_EV(331)) # 2112.059315570107
print(purchases_EV_approx(331)) # 2112.0595673648077
print(purchases_var(331)) # 180220.62129795854

The variance grows a lot faster than the expected value since O(Var)=n^2 which is faster than O(EV)=n\log(n).

import matplotlib.pyplot as plt
import numpy as np
import math

# Define the functions
def purchases_EV(n):
    Hn = 0
    for i in range(1, n+1):
        Hn += 1/i
    return n * Hn

def purchases_var(n):
    return ((n * math.pi) ** 2) / 6

# Generate values for n
n_values = np.arange(1, 332)
ev_values = np.array([purchases_EV(n) for n in n_values])
var_values = np.array([purchases_var(n) for n in n_values])

# Plot the functions
plt.figure(figsize=(10, 6))
plt.plot(n_values, ev_values, label='EV of purchases', color='blue')
plt.plot(n_values, var_values, label='Var of purchases', color='red')

# Add titles and labels
plt.title('EV vs. Var')
plt.xlabel('Number of Characters (n)')
plt.ylabel('Purchases')
plt.legend()
plt.grid(True)

# Show the plot
plt.show()

So, in conclusion, to collect all the animals in Crossy Road, it could take up to

    \[E(N_{331})+\text{Max}V(N_{331})=211206+18022062\]

or over 18 million coins.

Maybe on the flight back, I will just play Angry Birds…

Note: Source code for this project can be found on my GitHub here: https://github.com/trippytrung/Crossy-Roads-Prize-Machine