Workable published a short data-science (probabilistic) puzzle at http://buff.ly/1Rip3b0:

Suppose we have 4 coins, each with a different probability of throwing Heads. An unseen hand chooses a coin at random and flips it 50 times. This experiment is done several times with the resulting sequences as shown below (H = Heads, T = Tails) Write a program that will take as input the data that is collected from this experiment and estimate the probability of heads for each coin.

Well, I thought I would spend a few minutes on it and well, a few minutes turned into more than 30. My solution is simply maximum likelihood estimation (MLE), i.e., minimizing the negative log likelihood of the data given the model parameters:

where is the number of observed heads in the sequence of 50, is the coin selected ( forms a categorical distribution over the 4 coins for each sequence), and is the probability of heads for coin .

Since I’m all about trying Julia nowadays, that’s what I coded it up in (hosted on github). The first-cut solution (coinsoln_old.jl) found the following MLE estimates (negLL: 278.2343):

Maximum Likelihood Estimates: [0.428 ,0.307, 0.762, 0.817]

The first solution didn’t use any speed-up tricks nor the derivatives so, it should be easy to follow but is not terribly accurate or efficient. I then tried out automatic differentiation, which required minor code changes and sped up the computation significantly. This updated, faster solution (coinsoln.jl) found a slightly different result using conjugate gradients (negLL: -7.31325)

Maximum Likelihood Estimates: [0.283, 0.283,0.813,0.458]

Oh, and if I made any stupid errors, please let me know.