Loading 3 Votes - +

Help Teach a Fledgling Cryptographer

30_article_4063_thumb_crypto

So I was e-mailed a crypto puzzle the other day with a challenge – decrypt this. It’s coming from a high school student looking to learn more about the subject of cryptology. I thought I’d post it here for two purposes:

  1. Nerds – try your hand at decrypting something.
  2. Nerds – help point him in the right direction to learn more.

Fortunately, he has provided the key as well but relies on keeping the algorithm itself a secret. Bash away, but please do so constructively.

Ciphertext

Kfpmt Nczy, kfdn mdq fhksjktk wkav olbxaim. Qj zxb dgw onc pl poob ipd rjet sere rw pevjea. Fqst tgv pf rpt noc dcvx.

Key


0	1	2	3	4	5	6	7	8	9
a	b	c	d	e	f	g	h	i	j
b	c	c	e	f	g	h	i	j	k
c	c	e	f	g	h	i	j	k	l
d	e	f	g	h	i	j	k	l	m
e	f	g	h	i	j	k	l	m	n
f	g	h	i	j	k	l	m	n	o
g	h	i	j	k	l	m	n	o	p
h	i	j	k	l	m	n	o	p	q
i	j	k	l	m	n	o	p	q	r
j	k	l	m	n	o	p	q	r	s
k	l	m	n	o	p	q	r	s	t
l	m	n	o	p	q	r	s	t	u
m	n	o	p	q	r	s	t	u	v
n	o	p	q	r	s	t	u	v	w
o	p	q	r	s	t	u	v	w	x
p	q	r	s	t	u	v	w	x	y
q	r	s	t	u	v	w	x	y	z
r	s	t	u	v	w	x	y	z	a
s	t	u	v	w	x	y	z	a	b
t	u	v	w	x	y	z	a	b	c
u	v	w	x	y	z	a	b	c	d
v	w	x	y	z	a	b	c	d	e
w	x	y	z	a	b	c	d	e	f
x	y	z	a	b	c	d	e	f	g
y	z	a	b	c	d	e	f	g	h
z	a	b	c	d	e	f	g	h	i

Similarly tagged OmniNerd content:

Thread parent sort order:
Thread verbosity:

… the first lesson is that he’s got his secrets backward.

The purpose of a key is to parameterize the algorithm in such a way that knowing the algorithm, but not the key, one cannot decrypt the message in reasonable time.

In fact, in theoretical terms, the “key” is exactly the part of the decryption information that you keep secret, while the rest of the information is effectively the algorithm.

To be sure, though, the “key” provided doesn’t much look like a key — it’s too patterned. My guess is that the algorithm uses it as a lookup table.

If I had to guess what’s going on, he’s got some variation on a Vigenère cipher (polyalphabetic substitution cipher). They used to appear pretty commonly in newspaper “puzzle” sections — I don’t know if they still do.

Also, traditionally, real encrypted messages never gave away the lengths of the words. The text was just run together and transmitted as groups of five characters. It was assumed that, after decryption, the receiver could figure out where the spaces went.

So, this ciphertext would be offered as: kfpmt nczyk fdnmd qfhks jktkw kavol bxaim qjzxb dgwon cplpo obipd rjets ererw pevje afqst tgvpf rptno cdcvx.

1 Vote  - +
Shannon Entropy by VnutZ

Another very important factor to consider is the entropy of the ciphertext. As much as we OmniNerds all hate wikipedia, there’s a decent summary of Shannon Entropy there but it’s still applicable to encryption. You want the entropy of your ciphertext to be high … like in the 0.85-0.95 ballpark … because it’s a good indicator that you’ve extended your data space beyond the limited ASCII-7 character set (which in itself tends to dominant around a mere 26 characters due to frequency of lowercase characters) into a range seemingly random across the full spectrum of possible values. This particular cipher demonstrates an entropy exactly on par with plaintext. While that doesn’t “give away” the original message, it informs the cryptographer that their efforts are much more targeted to a smaller space.

Further reading:

1 Vote  - +
Starter Tips by VnutZ

So to get a handle on this topic, you really need to read this book: Applied Cryptography written by Bruce Schneier who is pretty renowned in the the security field. You don’t even need to read all of it … the first twelve chapters will teach you all the fundamentals and the right vocabulary to really talk cryptography. The rest of the book (thirteen more chapters) goes into detail about how it’s actually done using C source code examples. For a practical example, I would recommend you read about is DES (data encryption standard) which is a public specification the government used to use … but more importantly, read about how it was broken. There are heaps of good Internet articles on breaking DES since it’s used as a case study. These ought to get you started:

Also, some of what I’m about to write here may bash your code pretty hard. Don’t take it personally … I also thought having a transposition, polyalphabetic substitution was unbreakable when I first got into cryptography back in the early ‘90s. As I’ll get into later, the real magic comes when you escape the concept of letters and numbers and start treating everything as binary data – generally in blocks but streams work as well. At least you have the advantage of getting to use interpreted programming languages that easily allow for ridiculously large number math (like Ruby and Python) along with modern CUDA hardware on graphics cards for doing parallel integer math unlike the zany libraries I had to use because standard C distributions and processors only supported 32bit integers or 128bit floating point (which is somewhat useless in this sort of application). So again, take all this as learning points and use it to get to the next step.

So the bottom line is that the challenge isn’t to create unbreakable encryption. That’s been done at both the easy and the difficult levels. The term “unbreakable” itself is ambiguously used. Sometimes even easy substitution or matrix transposition ciphers can be considered unbreakable if they’re used properly. Mathematical ciphers are considered unbreakable because the computing power required to brute-force the key is computationally improbable (like all the computers in the world chugging away would still be doing it when the sun burned out kind of improbable). So while the 56bit DES keys can be broken in a mere 7 minutes or less now, 256bit AES keys are basically impossible. The truly unbreakable code that has stood the test of time is the one-time pad substitution technique that was used by KGB/CIA/etc for dealing with covert agents is ridiculously simple and can be done by hand. A one-time pad is a message that is encoded with a single key, once, ever. The key is never reused. If it’s never reused, it can’t get attacked via statistical anomalies, repetition, etc. Like I said, though, spies have been using one-time pads for a looooong time.

The best part about these encryption techniques is that the algorithms are publicly known and scrutinized … even with the cipher text and the algorithm, the intercepting party cannot discern the original message. Anytime the method of encryption is critical to its security … there’s a huge weakness which is a principle known as Kerckhoff’s Assumption in the crypto world. If you can tell me the method and its still unbreakable without the key, then it’s an effective code. After all, if you put your code into software, a reverse engineer is going to be able to steal the method out of the program itself meaning that while you intended for it to be a secret, it no longer is. Nobody ever has control of their algorithm once its put into software. A reverse engineer can find the algorithm inside the compiled code and either extract the raw binary and use it or even discern the algorithm back out into original source. So the key is the part that needs the most protection. You must assume your algorithm is compromised always so if the secrecy is based on that, it’s too late.

The code breakers have a variety of techniques by which they do their work. Just like in Wheel of Fortune where there are certain letters to pick because they occur frequently, code breakers will analyze the characters that appear to make assumptions about what letters were substituted. You used a multiple substitution (typically referred to as a polyalphabetic substitution cipher) which helps to reduce the effectiveness of that. Another technique comes from frequency of common words in language – prepositions, etc. That is actually a weakness in the display of your ciphertext which you could quickly fix by not including any white spaces or punctuation. Thus, successful code breaking comes not from having a single message (though some crazy people can do it) but from having many messages and doing analysis on them or their weaknesses. Having a single message is a lot like having the aforementioned one-time pad.

So, just to talk about how a cryptologist would go about breaking your code, let’s take a look at some information. For instance, right off the bat, your cipher shows format – it has punctuation, spaces between words, the right number of characters per word and even capital letters. I’m assuming the first two words are “Hello Matt,” but I could be wrong on that. With enough lexical analysis against the ciphertext (and in wartime) post-message analysis on enemy actions, the context can be derived and eventually the original message. Another thing to consider is the strength of your key. Given what you know about cryptography right now, it looks tight. But any given character is only swapped to another one of a possible 26 moves. Incorporate capital letters and you’ve got 1 in 52 moves. Add the numbers and you’ve got 1 in 62. Add punctuation … you get the idea. Ultimately, the right answer is to not think in terms of characters but to convert the original plaintext into a number. In computers, the letter A for instance is represented in hexadecimal as 0×42. A computer byte is 8 bits in size which can store 0xFF or 256 different values. See how we’ve gone from 1 in 26 to 1 in 256? You’ve made the key computationally more difficult and also make the transition to mathematical transformation a lot easier. And when you’re dealing with key size, the degree of difficulty isn’t geometric growth it is exponential. So (and I’m using a 1 in 16 selection for this example) an increase from a 4 bit key (yours) to an 8 bit key (one byte) isn’t just 2 times stronger it’s 8 times stronger. And using a 256 bit key (like AES/Blowfish/etc) is 7.23E75 times stronger (not too mention the advanced transformational algorithm).

There are cases of brute-forcing a message into plaintext … but none of them are really easy but it’s always possible and surprisingly, not necessarily requiring use of the original algorithm. For instance, it would be possible for a pattern to emerge such that the code breaker doesn’t need to know how you seeded the column selection – they could just map any single letter to 10 possible conditions and test every permutation. They’ll eventually crack the message in linear time without knowing what you really did. Cryptographer strive for transformations that require intensive, polynomial calculation for cracking which is pretty much worst case scenario. Read about elliptic curve cryptography for one of the cutting edge concepts:

And the computers that chug away at codes don’t really require a person to be at them to determine when the code was broken. Let’s say it was designed to go through zillions of permutations … it simply checks the output against a textual dictionary to see if the decoded text matches actual words. If there’s a high enough match, it’ll output that as a possible decrypted message. There’s an old joke about the NSA and CIA where the NSA will spend millions upon millions of dollars analyzing codes, building special computers and chugging away at keys while the CIA will buy a $5 wrench and beat the message out of you.

The real challenge … is how do you get the key to the other person? Take your cipher for example. Let’s say you’re living in San Francisco, California and I’m Sydney, Australia. You want to send me a secret message that cannot be read by the Chinese spies watching the transit paths. You encrypt your message and e-mail it to me. I open it and say, “WTF? I don’t have a key to decrypt this!” If you e-mail the key … the Chinese have it, too. You could try mailing it to me but if they spies are really interested they’re going to watch the mail or listen to the phone call, etc. The only way for me to have it securely is if I knew the key before I went to Australia.

This is why public key cryptography was invented. PKI is the general standard used by most people and it’s implementations are seen in SSL used by the web. I recommend reading about Phil Zimmerman and PGP (current free adaptations of his work are known as GPG). He’s already implemented the secure voice technology in PGPfone for the general public. Take a read of the following stuff and you’ll get smart on those techniques.

My understanding is that you want to implement a system that permits everyone to have rotating keys that cover everything from SMS, e-mail, VoIP, etc. It all depends on what layer of the OSI model you want to protect. If the need is to have it protected at the network or transport layers than we already have IPsec (network layer encryption on IP) and SSL (transport layer encryption on TCP). It’s arguable which layer an SSH tunnel resides at (could be session, presentation or application) but such an encryption tunnel is possible.

These designs at their core do what you’re trying to do. They generate a session key that is used only once which effectively produces a digital one time pad. All of the traffic is encrypted using that session key. But back to the problem I mentioned earlier, the challenge is in transmitting the session key between the two parties without anybody knowing it. That’s where public key encryption comes into play either PGP or PKI typically but both based on the principles of RSA (a security company named after the algorithm’s founders Ron Rivest, Adi Shamir and Len Adleman). The algorithm has stood strong for nearly 30 years as both a strong encryption method and key exchange tool. You can read about how RSA works here:

Now – somebody please school me here as I’m not versed at all on the concepts of quantum mechanics and cryptography. This next bit is what I’ve come to understand about it … Quantum Mechanics is the latest rage for cryptographers for two reasons. One, quantum mechanics can allow for two parties to exchange keys and know definitively that it was not intercepted in transit thanks to application of the Heisenburg Uncertainty Principle. As the technology grows, cryptographers hope to utilize quantum entanglement to literally make the key appear at the distant end without having to utilize any transmission mechanisms at all. And the code breakers are hoping to utilize quantum principles where every state exists simultaneously to effectively brute force all possible keys at once taking rendering the improbable time computation requirement moot.

So there’s one more place to apply the idea of universal encryption and that is with the data itself – the user will directly encrypt the SMS, the e-mail, etc. Therein lies the problem which is human laziness. It’s not that it can’t be done, it’s that people won’t do anything that requires them to take an extra step. Let’s discuss the flaw with encrypting each single SMS.

  1. You type your SMS into the keypad of your phone.
  2. You press the “encrypt” button.

Between steps 1 and 2, you’ve already been owned if somebody really cares to know what you’re typing. The cleartext exists in the memory buffers of the device. The cleartext exists in the algorithm itself as it’s passed around as a parameter. A piece of malware installed on your device can capture the cleartext by hooking into the appropriate function calls catching it before it’s encrypted. Even easier, the malware logs the keys as you type to a hidden file so your message is already recorded elsewhere.

The other problem lies in step 2 itself. What are you encrypting with? There are three basic premises for what to encrypt with. Something you know (password). Something you have (token). Something you are (biometrics). The latter is pretty effective … but once retina patterns or fingerprints are converted into unique, digital information, it has to reside somewhere. And for all intents and purposes, a biometric key is simply another digital key [password] that can be cracked given an overabundant use of it. And it’s further flawed in that you can’t change your retina after your retina is “cracked”.

Take for example something you have … a token. RSA makes a SecurID that generates a new seed number every 60 seconds or less. This allows a person to enter the token’s number and generate a new, unique key. That’s pretty good and seems to achieve the goal. But there are two flaws. Let’s say a week passes and then you want to open a message from your girlfriend. Well, you need the key that was generated when that message was encrypted (which is roughly 10,000 keys ago). You don’t have that anymore … so you’ve actually defeated yourself in that nobody can open their encrypted material anymore meaning nobody will use it. RSA tokens are used to generate the key exchange techniques mentioned earlier in the public key discussion.

There’s another double edged sword to this idea – the rotating key concept. It’s the difference between deterministic and non-deterministic algorithms. Remember, both parties have to be using the same key which means regardless of the distance between them, they need to be time synchronized such that their keys rotate together otherwise neither party can decrypt each other’s correspondence. And that means their separate, independent devices need to compute the exact same, seemingly random number at the exact same time. That means it’s not random – just improbable that an adversary will compute the number within the rotation time. But if the device is compromised, and they always are, the algorithm will be discerned and because it is deterministic … the adversary will generate the same key as the communication endpoints and be able to decrypt the traffic too.

Lastly, there’s something you know, a traditional password. Frankly, the encryption techniques to do strong cryptography have existed for decades. So why doesn’t anybody use it? It’s not that it’s proprietary or not available. It’s that people don’t want to remember hundreds of passwords because unless everyone globally shares a password … each correspondence is going to be different. You need a different password for SMS to your girlfriend, SMS to your buddies and SMS to your Mom. Public key cryptography takes care of that … but regular people hate managing revocation lists, trust rings, etc. Phil Zimmerman’s PGP/GPG is the common man solution. PKI is the corporate solution where trust rings are handled by a “trusted third party”. But trusted party fell prey to attack (http://www.darkreading.com/authentication/167901072/security/attacks-breaches/231600498/digital-certificate-authority-hacked-dozens-of-phony-digital-certificates-issued.html and http://arstechnica.com/security/news/2011/09/comodo-hacker-i-hacked-diginotar-too-other-cas-breached.ars) because it created a single point for attack.

Share & Socialize

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

What if a spouse cheats?

23 votes, 4 comments