## Help Teach a Fledgling Cryptographer

So I was e-mailed a crypto puzzle the other day with a challenge – decrypt this. It’s coming from a high school student looking to learn more about the subject of cryptology. I thought I’d post it here for two purposes:

- Nerds – try your hand at decrypting something.
- Nerds – help point him in the right direction to learn more.

Fortunately, he has provided the key as well but relies on keeping the algorithm itself a secret. Bash away, but please do so constructively.

## Ciphertext

`Kfpmt Nczy, kfdn mdq fhksjktk wkav olbxaim. Qj zxb dgw onc pl poob ipd rjet sere rw pevjea. Fqst tgv pf rpt noc dcvx.`

## Key

```
0 1 2 3 4 5 6 7 8 9
a b c d e f g h i j
b c c e f g h i j k
c c e f g h i j k l
d e f g h i j k l m
e f g h i j k l m n
f g h i j k l m n o
g h i j k l m n o p
h i j k l m n o p q
i j k l m n o p q r
j k l m n o p q r s
k l m n o p q r s t
l m n o p q r s t u
m n o p q r s t u v
n o p q r s t u v w
o p q r s t u v w x
p q r s t u v w x y
q r s t u v w x y z
r s t u v w x y z a
s t u v w x y z a b
t u v w x y z a b c
u v w x y z a b c d
v w x y z a b c d e
w x y z a b c d e f
x y z a b c d e f g
y z a b c d e f g h
z a b c d e f g h i
```

Similarly tagged OmniNerd content:

- Data Wiping Malware, by VnutZ over 2 years ago
- Advanced Crypto Present in Flame, by VnutZ over 2 years ago
- John Nash Letters Declassified, by VnutZ about 3 years ago
- Key Generation Weakness, by VnutZ about 3 years ago

## If he's learning about crypto... by scottb

… the first lesson is that he’s got his secrets backward.

The purpose of a key is to parameterize the algorithm in such a way that knowing the algorithm, but not the key, one cannot decrypt the message in reasonable time.

In fact, in theoretical terms, the “key” is exactly the part of the decryption information that you keep secret, while the rest of the information is effectively the algorithm.

To be sure, though, the “key” provided doesn’t much look like a key — it’s too patterned. My guess is that the algorithm uses it as a lookup table.

If I had to guess what’s going on, he’s got some variation on a Vigenère cipher (polyalphabetic substitution cipher). They used to appear pretty commonly in newspaper “puzzle” sections — I don’t know if they still do.

Also, traditionally, real encrypted messages never gave away the lengths of the words. The text was just run together and transmitted as groups of five characters. It was assumed that, after decryption, the receiver could figure out where the spaces went.

So, this ciphertext would be offered as:

`kfpmt nczyk fdnmd qfhks jktkw kavol bxaim qjzxb dgwon cplpo obipd rjets ererw pevje afqst tgvpf rptno cdcvx`

.## Shannon Entropy by VnutZ

Another very important factor to consider is the entropy of the ciphertext. As much as we OmniNerds all hate wikipedia, there’s a decent summary of Shannon Entropy there but it’s still applicable to encryption. You want the entropy of your ciphertext to be high … like in the 0.85-0.95 ballpark … because it’s a good indicator that you’ve extended your data space beyond the limited ASCII-7 character set (which in itself tends to dominant around a mere 26 characters due to frequency of lowercase characters) into a range seemingly random across the full spectrum of possible values. This particular cipher demonstrates an entropy exactly on par with plaintext. While that doesn’t “give away” the original message, it informs the cryptographer that their efforts are much more targeted to a smaller space.

Further reading:

## RE: Shannon Entropy by scottb

You want the entropy of your ciphertext to be high … like in the 0.85-0.95 ballpark … because it’s a good indicator that you’ve extended your data space beyond the limited ASCII-7 character set (which in itself tends to dominant around a mere 26 characters due to frequency of lowercase characters) into a range seemingly random across the full spectrum of possible values.

If your cipher was “perfect”, then the probability of a letter should be uniform across the whole alphabet, and independent of the letters that came before it. Ignoring case and punctuation, there are 26 letters, so “perfect” cipher would have an entropy of about 4.7 bits per character (taking logarithms to the base 2).

Unencrypted English text supposedly has an entropy of about 1.5 bits per character, so I think you want your ciphertext’s entropy to be quite a bit higher than 0.85-0.95 — ciphertext shouldn’t be

morepredictable than plaintext. :)## RE: Shannon Entropy by VnutZ

It’s been awhile since I did those myself … the last system I saw scaled stuff between 0 and 1 so that was the frame of reference I was using.

## RE: Shannon Entropy by scottb

Probabilities scale that way, but entropy doesn’t. Entropy is -∑p(x)log(p(x)). The log(p(x)) factor, since p(x) ranges from 0 to 1, ranges from -Inf to 0.

The summand has a peak where p(x) = 1/e (0.3679, at which point p(x)log(p(x)) is 0.5307). There can be as many terms in the sum as you want, though (26 of them in our crypto example).

If the entropy is at the maximum, with an alphabet size of N, then the entropy is log(N). That can get as big as you like, though it grows quite slowly — for an alphabet of 1000 symbols, it’s only a bit under ten bits.

## RE: Shannon Entropy by Occams

Doesn’t that statistical probability approach imply that a particular alphabetic character is always translated into the same character? I thought that was avoided in the better cypers.

## RE: Shannon Entropy by scottb

No. “Entropy”, as used here, is a measure of predictability.

At any point, having received a partial message S, we can ask, what’s the probability that the next character will be x, given that it’s preceded by S?

In ordinary English, for example, if S ends with the letter ‘q’, then it’s a near certainty that the next character will be a ‘u’. The letter ‘t’ is most commonly followed by the letter ‘h’.

Given the probability of each letter, p(x), the Shannon entropy is -∑p(x)log(p(x)), summed over all letters. Logarithms are usually taken to the base 2, in which case the units are “bits”. (Natural logarithms give “nats”, and base 10 logs give “bans”.) Since the probabilities have to sum to 1, it’s not hard to show that the entropy expression takes its maximum value when the probabilities are all the same.

Looking at it from the bigger picture, it’s not hard to see why maximizing entropy is good for a code. If the probabilities are all equal, I’ve got nothing that suggests what the next character should be — they’re all equally likely. If they’re different, then I have information that lets me guess (not with complete accuracy, but better than without) the next character.

If, as you suggest, characters were always translated to the same character, the result would just be scrambling the probabilities — whatever ‘e’ translated to would end up still being the most common character. The probabilities would be about the same as for English — leading to an entropy figure of around 1.5 bits per character.

## RE: Shannon Entropy by Anonymous

I wonder if somebody here can help me.

I am trying to get a handle on the way in which the (thermodynamic) notion of entropy as degree of disorder ties in with the Shannon model.

I am only after a qualitative interpretation.

I appreciate that for a coin toss sequence HHHHH… and TTTTT… are considered to have perfect order. Also that a purely random sequence has perfect disorder.

But how does Shannon handle runs such as HTHTHT… , HHTTHHTTHHTT… , THHHTHHH… etc?

We would normally perceive results such as these from real world experiments to be highly ordered.

I only need an intuitive non-mathematical model to reconcile the two.

Your help with my naive inquiry would be much appreciated.

## RE: Shannon Entropy by scottb

I’ll give it a go.

Shannon entropy is a measure of how “expected” a sequence is, given its past history. So, if you’ve seen “HTHTHTHTHTHTH”, you expect that it’s

morelikely the next symbol will be “T” than “H”.To put that more precisely, you assign a higher probability to the outcome “H” than to “T”. The formula for entropy is -∑p(x)log(p(x)), where p(x) is the probability assigned to symbol

x, and the sum is taken over all possible values ofx. Since the p(x) have to always sum to 1, when one probability increases, it has to come at the expense of the others.Thermodynamic entropy is a measure of how “expected” a given state is. Normally, we “expect” the state to be pretty mixed — we don’t expect all the molecules of atmosphere in a room to gather one corner (somewhat analogous to the “HHHHHH” case).

To bring the two concepts together, it’s important to realize that we can’t actually examine the details of the thermodynamic state directly — instead, we measure various statistical properties of the system (temperature, pressure, volume, mass, charge, etc.) that are related to the state. The detailed state includes the exact number of particles, their energy states, positions, velocities, and so on.

Note that there are usually a

hugenumber of detailed states that have the same statistical properties. Given any state, you can swap any particle for any other similar particle and the state is the same (swap any set of electrons, swap any set of protons, etc.) Temperature is essentially the average velocity of the particles — speed one up and slow another down by the right amounts and you’ve got the same temperature.The macro-scale statistical measurements you make (temperature, etc.) allow you to put constraints on the possible microstates the system can possibly be in. Of all possible states, only some of them have a given temperature, pressure, etc. In the entropy formula, these constraints show up as changes in the probabilities — when you measure the temperature, the probabilities have to be adjusted to reflect that states

notat that temperature are now very improbable. But, as with adjusting the probabilities associated with the coin tosses, the more constraints you put on the system — the more you can pare down the possible set of microstates — the lower the entropy of the system.In the coin-tossing view, this is like not being able to see the outcomes of individual tosses, but instead being told statistical properties like the ratio of heads to tosses, the average length of a run of heads, the length of the longest run, and so on. With just ten tosses, there are 1024 possible “states”. If you know that half of the tosses came up heads, you’ve narrowed it down to one of 252 states.

With a million tosses, there are a huge number of outcomes with exactly half heads. On the other hand, there’s only

onepossible outcome with a million heads. So, knowing the ratio of heads to tosses gives some constraint on the possible microstate (the exact sequence of tosses). Low entropy states correspond to outcomes where the ratio is far from 1/2.Suppose you had tossed a fair coin twenty times and — one in a million chance — they all came up heads, a low-entropy state. As you continue to add tosses to the sequence, it’s very unlikely that you’re going to

remainin the low entropy state — there’s a 50% chance that the twenty-first toss will come up tails, bumping up the entropy. This is the essence of the second law of thermodynamics — transitions to higher-entropy states are more likely than transitions to lower-entropy states, so entropy increases.Does that help?

## RE: Shannon Entropy by Anonymous

Thanks for your input, scottb.

I am actually pretty well familiar with the thermodynamic aspects of entropy.

My concern is rather as to the usefulness of the term “degree of disorder” to get the meaning of it across to general readers, which is my aim.

The problem being that any series such that I gave as examples would ordinarily be perceived to be ordered. And if the Shannon entropy, which which I am not very familiar, could in some way brought to bear on this.

I think you effectively negated this in the first line of your reply:

" Shannon entropy is a measure of how “expected” a sequence is, given its past history"

Which, as I suspected, seems to confirm that Shannon theory has nothing to say about absolute configurations but only probabilities derived from previous history of a sequence.

Right?

## RE: Shannon Entropy by scottb

The problem being that any series such that I gave as examples would ordinarily be perceived to be ordered.

That’s only because they

are. They’re fully specified. If I gave you a specific thermodynamic state, say for a small volume of gas, it gives the same sense of being “ordered”.Which, as I suspected, seems to confirm that Shannon theory has nothing to say about absolute configurations but only probabilities derived from previous history of a sequence.

In a sense, but only in the same sense that thermodynamic entropy has nothing to say about absolute states, ultimately, as information entropy and thermodynamic entropy are exactly the same thing.

In thermodynamics, the dynamic evolution of the system is always the same — the laws of quantum mechanics specify exactly how future states evolve from present states. Shannon’s entropy originally came about in considering communications channels, in which the evolution function is part of what’s being studied, so it’s almost always introduced in terms of the probability distribution of subsequent symbols, given a history.

Note that the probabilities are rarely determined by the specific symbols in the history. The usual model is that we

knowthe probability distribution with which the transmitter is sending, but the channel itself introduces noise in such a way that the receiver sees a different symbol from the one sent.The “probabilities derived from previous history of a sequence” that you mention are more aligned with notions like Kolmogorov complexity than with entropy,

per se, though they’re somewhat related.I think maybe the issue has to do with seeing entropy as a “degree of disorder” in the first place. That’s really not a good approach. While it’s true that most configurations people are likely to view as “highly ordered” are low entropy configurations, the reverse isn’t true — the majority of low entropy configurations wouldn’t be considered “highly ordered”. Rocks arranged around the rim of a crater don’t seem any more “ordered” than rocks in a pile at the bottom, despite their having different entropy.

Entropy isn’t about single configurations. It’s about subspaces of the full configuration space. In thermodynamic entropy, your macro-level description of the system (temperature, pressure, etc) doesn’t correspond to a

singlestate, it corresponds to many states, which occupy some volume within the space of all possible configurations. The entropy is, quite literally, the logarithm of that volume (times Boltzmann’s constant, for historical reasons).In other words, it’s a measure of the information you have. If your information goes beyond the ordinary macro-scale measurements, you reduce the volume further. At the limit, your information pins the state down to a single point in configuration space, with zero volume and thus, zero entropy. In practice, Heisenberg uncertainty prevents you from getting there (particle position and momentum are part of the state, but they’re a conjugate pair), but the principle is the same.

Information (Shannon) entropy and thermodynamic entropy aren’t different concepts. They’re the exact same concept

discoveredin two different ways.## Starter Tips by VnutZ

So to get a handle on this topic, you really need to read this book: Applied Cryptography written by Bruce Schneier who is pretty renowned in the the security field. You don’t even need to read all of it … the first twelve chapters will teach you all the fundamentals and the right vocabulary to really talk cryptography. The rest of the book (thirteen more chapters) goes into detail about how it’s actually done using C source code examples. For a practical example, I would recommend you read about is DES (data encryption standard) which is a public specification the government used to use … but more importantly, read about how it was broken. There are heaps of good Internet articles on breaking DES since it’s used as a case study. These ought to get you started:

Also, some of what I’m about to write here may bash your code pretty hard. Don’t take it personally … I also thought having a transposition, polyalphabetic substitution was unbreakable when I first got into cryptography back in the early ‘90s. As I’ll get into later, the real magic comes when you escape the concept of letters and numbers and start treating everything as binary data – generally in blocks but streams work as well. At least you have the advantage of getting to use interpreted programming languages that easily allow for ridiculously large number math (like Ruby and Python) along with modern CUDA hardware on graphics cards for doing parallel integer math unlike the zany libraries I had to use because standard C distributions and processors only supported 32bit integers or 128bit floating point (which is somewhat useless in this sort of application). So again, take all this as learning points and use it to get to the next step.

So the bottom line is that the challenge isn’t to create unbreakable encryption. That’s been done at both the easy and the difficult levels. The term “unbreakable” itself is ambiguously used. Sometimes even easy substitution or matrix transposition ciphers can be considered unbreakable if they’re used properly. Mathematical ciphers are considered unbreakable because the computing power required to brute-force the key is computationally improbable (like all the computers in the world chugging away would still be doing it when the sun burned out kind of improbable). So while the 56bit DES keys can be broken in a mere 7 minutes or less now, 256bit AES keys are basically impossible. The truly unbreakable code that has stood the test of time is the one-time pad substitution technique that was used by KGB/CIA/etc for dealing with covert agents is ridiculously simple and can be done by hand. A one-time pad is a message that is encoded with a single key, once, ever. The key is never reused. If it’s never reused, it can’t get attacked via statistical anomalies, repetition, etc. Like I said, though, spies have been using one-time pads for a

looooongtime.The best part about these encryption techniques is that the algorithms are publicly known and scrutinized … even with the cipher text and the algorithm, the intercepting party cannot discern the original message. Anytime the method of encryption is critical to its security … there’s a huge weakness which is a principle known as Kerckhoff’s Assumption in the crypto world. If you can tell me the method and its still unbreakable without the key, then it’s an effective code. After all, if you put your code into software, a reverse engineer is going to be able to steal the method out of the program itself meaning that while you intended for it to be a secret, it no longer is. Nobody ever has control of their algorithm once its put into software. A reverse engineer can find the algorithm inside the compiled code and either extract the raw binary and use it or even discern the algorithm back out into original source. So the key is the part that needs the most protection. You must assume your algorithm is compromised always so if the secrecy is based on that, it’s too late.

The code breakers have a variety of techniques by which they do their work. Just like in Wheel of Fortune where there are certain letters to pick because they occur frequently, code breakers will analyze the characters that appear to make assumptions about what letters were substituted. You used a multiple substitution (typically referred to as a polyalphabetic substitution cipher) which helps to reduce the effectiveness of that. Another technique comes from frequency of common words in language – prepositions, etc. That is actually a weakness in the display of your ciphertext which you could quickly fix by not including any white spaces or punctuation. Thus, successful code breaking comes not from having a single message (though some crazy people can do it) but from having many messages and doing analysis on them or their weaknesses. Having a single message is a lot like having the aforementioned one-time pad.

So, just to talk about how a cryptologist would go about breaking your code, let’s take a look at some information. For instance, right off the bat, your cipher shows format – it has punctuation, spaces between words, the right number of characters per word and even capital letters. I’m assuming the first two words are “Hello Matt,” but I could be wrong on that. With enough lexical analysis against the ciphertext (and in wartime) post-message analysis on enemy actions, the context can be derived and eventually the original message. Another thing to consider is the strength of your key. Given what you know about cryptography right now, it looks tight. But any given character is only swapped to another one of a possible 26 moves. Incorporate capital letters and you’ve got 1 in 52 moves. Add the numbers and you’ve got 1 in 62. Add punctuation … you get the idea. Ultimately, the right answer is to not think in terms of characters but to convert the original plaintext into a number. In computers, the letter A for instance is represented in hexadecimal as 0×42. A computer byte is 8 bits in size which can store 0xFF or 256 different values. See how we’ve gone from 1 in 26 to 1 in 256? You’ve made the key computationally more difficult and also make the transition to mathematical transformation a lot easier. And when you’re dealing with key size, the degree of difficulty isn’t geometric growth it is exponential. So (and I’m using a 1 in 16 selection for this example) an increase from a 4 bit key (yours) to an 8 bit key (one byte) isn’t just 2 times stronger it’s 8 times stronger. And using a 256 bit key (like AES/Blowfish/etc) is 7.23E75 times stronger (not too mention the advanced transformational algorithm).

There are cases of brute-forcing a message into plaintext … but none of them are really easy but it’s always possible and surprisingly, not necessarily requiring use of the original algorithm. For instance, it would be possible for a pattern to emerge such that the code breaker doesn’t need to know how you seeded the column selection – they could just map any single letter to 10 possible conditions and test every permutation. They’ll eventually crack the message in linear time without knowing what you really did. Cryptographer strive for transformations that require intensive, polynomial calculation for cracking which is pretty much worst case scenario. Read about elliptic curve cryptography for one of the cutting edge concepts:

And the computers that chug away at codes don’t really require a person to be at them to determine when the code was broken. Let’s say it was designed to go through zillions of permutations … it simply checks the output against a textual dictionary to see if the decoded text matches actual words. If there’s a high enough match, it’ll output that as a possible decrypted message. There’s an old joke about the NSA and CIA where the NSA will spend millions upon millions of dollars analyzing codes, building special computers and chugging away at keys while the CIA will buy a $5 wrench and beat the message out of you.

The real challenge … is how do you get the key to the other person? Take your cipher for example. Let’s say you’re living in San Francisco, California and I’m Sydney, Australia. You want to send me a secret message that cannot be read by the Chinese spies watching the transit paths. You encrypt your message and e-mail it to me. I open it and say, “WTF? I don’t have a key to decrypt this!” If you e-mail the key … the Chinese have it, too. You could try mailing it to me but if they spies are really interested they’re going to watch the mail or listen to the phone call, etc. The only way for me to have it securely is if I knew the key before I went to Australia.

This is why public key cryptography was invented. PKI is the general standard used by most people and it’s implementations are seen in SSL used by the web. I recommend reading about Phil Zimmerman and PGP (current free adaptations of his work are known as GPG). He’s already implemented the secure voice technology in PGPfone for the general public. Take a read of the following stuff and you’ll get smart on those techniques.

My understanding is that you want to implement a system that permits everyone to have rotating keys that cover everything from SMS, e-mail, VoIP, etc. It all depends on what layer of the OSI model you want to protect. If the need is to have it protected at the network or transport layers than we already have IPsec (network layer encryption on IP) and SSL (transport layer encryption on TCP). It’s arguable which layer an SSH tunnel resides at (could be session, presentation or application) but such an encryption tunnel is possible.

These designs at their core do what you’re trying to do. They generate a session key that is used only once which effectively produces a digital one time pad. All of the traffic is encrypted using that session key. But back to the problem I mentioned earlier, the challenge is in transmitting the session key between the two parties without anybody knowing it. That’s where public key encryption comes into play either PGP or PKI typically but both based on the principles of RSA (a security company named after the algorithm’s founders Ron Rivest, Adi Shamir and Len Adleman). The algorithm has stood strong for nearly 30 years as both a strong encryption method and key exchange tool. You can read about how RSA works here:

Now – somebody please school me here as I’m not versed at all on the concepts of quantum mechanics and cryptography. This next bit is what I’ve come to understand about it … Quantum Mechanics is the latest rage for cryptographers for two reasons. One, quantum mechanics can allow for two parties to exchange keys and know definitively that it was not intercepted in transit thanks to application of the Heisenburg Uncertainty Principle. As the technology grows, cryptographers hope to utilize quantum entanglement to literally make the key appear at the distant end without having to utilize any transmission mechanisms at all. And the code breakers are hoping to utilize quantum principles where every state exists simultaneously to effectively brute force all possible keys at once taking rendering the improbable time computation requirement moot.

So there’s one more place to apply the idea of universal encryption and that is with the data itself – the user will directly encrypt the SMS, the e-mail, etc. Therein lies the problem which is human laziness. It’s not that it can’t be done, it’s that people won’t do anything that requires them to take an extra step. Let’s discuss the flaw with encrypting each single SMS.

Between steps 1 and 2, you’ve already been owned if somebody really cares to know what you’re typing. The cleartext exists in the memory buffers of the device. The cleartext exists in the algorithm itself as it’s passed around as a parameter. A piece of malware installed on your device can capture the cleartext by hooking into the appropriate function calls catching it before it’s encrypted. Even easier, the malware logs the keys as you type to a hidden file so your message is already recorded elsewhere.

The other problem lies in step 2 itself. What are you encrypting with? There are three basic premises for what to encrypt with. Something you know (password). Something you have (token). Something you are (biometrics). The latter is pretty effective … but once retina patterns or fingerprints are converted into unique, digital information, it has to reside somewhere. And for all intents and purposes, a biometric key is simply another digital key [password] that can be cracked given an overabundant use of it. And it’s further flawed in that you can’t change your retina after your retina is “cracked”.

Take for example something you have … a token. RSA makes a SecurID that generates a new seed number every 60 seconds or less. This allows a person to enter the token’s number and generate a new, unique key. That’s pretty good and seems to achieve the goal. But there are two flaws. Let’s say a week passes and then you want to open a message from your girlfriend. Well, you need the key that was generated when that message was encrypted (which is roughly 10,000 keys ago). You don’t have that anymore … so you’ve actually defeated yourself in that nobody can open their encrypted material anymore meaning nobody will use it. RSA tokens are used to generate the key exchange techniques mentioned earlier in the public key discussion.

There’s another double edged sword to this idea – the rotating key concept. It’s the difference between deterministic and non-deterministic algorithms. Remember, both parties have to be using the same key which means regardless of the distance between them, they need to be time synchronized such that their keys rotate together otherwise neither party can decrypt each other’s correspondence. And that means their separate, independent devices need to compute the exact same, seemingly random number at the exact same time. That means it’s not random – just improbable that an adversary will compute the number within the rotation time. But if the device is compromised, and they always are, the algorithm will be discerned and because it is deterministic … the adversary will generate the same key as the communication endpoints and be able to decrypt the traffic too.

Lastly, there’s something you know, a traditional password. Frankly, the encryption techniques to do strong cryptography have existed for decades. So why doesn’t anybody use it? It’s not that it’s proprietary or not available. It’s that people don’t want to remember hundreds of passwords because unless everyone globally shares a password … each correspondence is going to be different. You need a different password for SMS to your girlfriend, SMS to your buddies and SMS to your Mom. Public key cryptography takes care of that … but regular people hate managing revocation lists, trust rings, etc. Phil Zimmerman’s PGP/GPG is the common man solution. PKI is the corporate solution where trust rings are handled by a “trusted third party”. But trusted party fell prey to attack (http://www.darkreading.com/authentication/167901072/security/attacks-breaches/231600498/digital-certificate-authority-hacked-dozens-of-phony-digital-certificates-issued.html and http://arstechnica.com/security/news/2011/09/comodo-hacker-i-hacked-diginotar-too-other-cas-breached.ars) because it created a single point for attack.