## Monkeys Recreate Shakespeare

There’s an old saying about a million monkeys hammering away at keyboards such that eventually, one of them will randomly produce the works of Shakespeare. Some people refer to it as the infinite monkey theorem and in the past various feasibility studies have looked into how valid it is. Well, the theory has been put to the test using Amazon EC2 distributed computing resources by Jesse Anderson.

“For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.”

##### Shakespeare? by Occams

I thought that was how they produce episodes of Two and a Half Men.

##### Shakespearian tragedy! by Occams

In order to understand how improbable the monkey authorship of Shakespeare is, consider that you might get 14 episodes of Two and A Half Men before you would get a Hamlet.

Wouldn’t it break your heart when you read one of his manuscripts and find that it is a word (and punctuation) perfect version of Hamlet, except that Act 2 scene 2 is from Macbeth?

##### RE: Shakespearian tragedy! by Occams

Having typed the first letter of Hamlet, the probability of the next letter being correct is 1/61 if our monkey has a normal keyboard with 61 keys, and he is equally able to hit any of them. It is always the same 1/61 chance that the next character will be right.

Let’s guess that there are 6000 words in Hamlet, of average length 5 characters. That means 30,000 characters. Make it 33000 to provide for spaces and punctuation.

The probability that our monkey will proceed to type this version of Hamlet is 1/61 to the power of 32000

Wolfram Alpha gives the answer as 1.304501... × 10^-58916

##### RE: Shakespearian tragedy! by Occams

I am wrong! My monkey is equally unlikely to type any manuscript of the same length, because he has know idea of what he is typing.

So if the 14 episodes of 2 1/2 men add up to the same length as Hamlet, he is just as likely to type Hamlet first.

In reality there would be far too many spaces in the Monkeys work because that key is the size of at least 6 other keys and so has a much higher probability of being struck.

##### RE: Shakespearian tragedy! by scottb

So if the 14 episodes of 2 1/2 men add up to the same length as Hamlet, he is just as likely to type Hamlet first.

I think the normal view on this is that the monkeys generate an unending string of characters (usually with uniform probability from a fixed alphabet), and the stream is checked to see if a particular document appears anywhere within it.

In an infinite stream, every single work of Shakespeare (and of every other author. of course) eventually appears. That’s just part of the weirdness of infinities.

The more interesting view of the question of 2½ Men compared to Hamlet is to ask how many episodes of Men are likely to appear in the stream before Hamlet does.

##### RE: Shakespearian tragedy! by Occams

I think that there is an equal probability of an episode of 2 1/2 men as a passage of Shakespeare of the same length.

Given that Hamlet has many more words than a TV episode, it would take much longer to appear in full. There would be millions of almost complete versions before the full one.

Hamlet may not appear in the lifetime of the Earth, which (I think) has only about 5 billion years left.

I think it is not a fixed alphabet so much as a fixed typewriter keyboard.

##### RE: Shakespearian tragedy! by scottb

I think it is not a fixed alphabet so much as a fixed typewriter keyboard.

For modeling purposes, it’s a fixed alphabet. It doesn’t really make much of a difference what the alphabet is, either — just getting the right letters (independent of case), spaces, and limited punctuation (commas, periods, question marks, exclamation points) is very nearly as different as getting the case right and all the less important punctuation.

I think that there is an equal probability of an episode of 2 1/2 men as a passage of Shakespeare of the same length.

Yes — of the same length. But, as you point out, Hamlet is much longer than a Men script, so you expect the first Men script to appear at a proportionally shorter distance into the generated (infinite) string.

Hamlet may not appear in the lifetime of the Earth, which (I think) has only about 5 billion years left.

If you take the alphabet to be 50 characters, and you have as many monkeys as there are particles in the universe (1080), and each monkey-particle generates a thousand characters per second, and you let them go for 100 times the life of the universe (1020s), the probability of generating even a small text — much less Hamlet — is still very near zero.

For Hamlet (something around 183,000 characters), the actual number of characters until the text appears is 4.4×10360,783. Forget the lifetime of the Earth, that’s many times longer than the future heat death of the universe.

There’s no operational sense in which the “infinite monkey theorem” is true. It’s only theoretical.

It’s funny — people have such a hard time grasping how infinities work that notions like this one easily confuse many of them. Yet they’ll blithely throw infinities around when talking about religion as if that somehow makes them make sense.

##### RE: Shakespearian tragedy! by Occams

OK, but the Monkey is choosing keys not alphabet characters.

The saying that a monkey could type poetry eventually is deniable because it would take an infinite succession of hard working monkeys to type even a few pages.

But you’re right: it is only theoretical. We all know that monkeys prefer to write in iambic pentameter.

##### RE: Shakespearian tragedy! by scottb

OK, but the Monkey is choosing keys not alphabet characters.

That’s why the “fixed alphabet” simplification. Each key represents one character of output. Issues like meta-keys (Shift, Ctrl, whatever) are ignored. In any practical system, the probability of a monkey correctly doing a capital letter is ridiculously small, so forget about that and pretend they’re typing all-caps and there’s a one-to-one correspondence between keys and symbols in the output string.

