Main content
Course: Computer science theory > Unit 2
Lesson 1: Ancient cryptography- What is cryptography?
- The Caesar cipher
- Caesar Cipher Exploration
- Frequency Fingerprint Exploration
- Polyalphabetic cipher
- Polyalphabetic Exploration
- The one-time pad
- Perfect Secrecy Exploration
- Frequency stability property short film
- How uniform are you?
- The Enigma encryption machine
- Perfect secrecy
- Pseudorandom number generators
- Random Walk Exploration
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Polyalphabetic cipher
Brit introduces the polyalphabetic cipher, which creates a lighter fingerprint than the Caesar cipher. Created by Brit Cruise.
Want to join the conversation?
- Could someone please explain the meaning of the following sentences of this lesson?
a)"Any time there is a differential in letter frequencies, a leak of information occurs. These differences are caused by repetition in the encrypted message."
b)"[the codebraker] will need to go through and check the *frequency distributions of different intervals.*"
* emphasis added by me
Can someone explain these things by, say, decrypting the message seen in the video step-by-step?
Thank you.(111 votes)- Well, as in the video, with each code there is almost always some form of pattern, no matter how minor. Each pattern/unique difference is something that may stand out, which will then draw one's attention to that specific section.
In the video, each difference in letter frequency assists the code-breaker "Eve" by hinting at which letters/numbers may have been modified the most. Eventually, a pattern will be deduced from the discrepancies between each tally amount of characters.
The best code will have a lessened "fingerprint", or more minuscule and therefore harder to notice patterns. No code is completely fool-proof, unless in the obvious situation where the code-makers were just making stuff up or died without sharing the full message.(51 votes)
- I love this series! I had to pause this to understand better.
At0:50if you do the decrypt math, it seems any of the letters that are being modified by +13 are off by one, so they should be MEET=FRFE, AT=NU, ELEPHANT=PQXCILSM, LAKE=YBVJ
Am I missing something?(51 votes)- 14 should be instead of 13, because 13=M and 14=N XD(4 votes)
- I wasn't able to understand decrypting the message with the snake example. how do you determine the length of the word used to encrypt?(44 votes)
- I didn't understand ether, but then watching Part 7: One-time Pad helped me understand the Part 6!(5 votes)
- Ok. Basically, me & you exchange in advance any word we want. That word is our "master plan". Every time you write me a letter, you take a look at our "master word", turn the word into a bunch of numbers, A=1 Z=26 etc. Then... you write your message, writing on top of every letter, our chosen "master word" over and over. Before you send it to me, you shift every letter in your message by the number represented by the "master letter" that you have written on top of your rough draft! No?(11 votes)
- Yes that's correct!
As an example, let’s say the message is, “Elephant Lake has been compromised."
Then, let’s say the master word is “snake” as in the video at0:23.
The message would translate to numbers as follows:
ELEPHANT LAKE HAS BEEN COMPROMISED.
5 12 5 16 8 1 14 20 12 1 11 5 8 1 19 2 5 5 14 3 15 13 16 18 15 13 9 19 5 4.
The ‘master word’ would translate to numbers as follows:
SNAKE
19 14 1 11 5
Then, place the message over the "master word" or key, as it is commonly called, and shift, like this:
E L E P H A N T L A K E
5 12 5 16 8 1 14 20 12 1 11 5
19 14 1 11 5 19 14 1 11 5 19 14
24 26 6 1 13 20 2 21 23 6 4 19
X Z F A M T B U W F D S
H A S B E E N
8 1 19 2 5 5 14
1 11 5 19 14 1 11
9 12 24 21 19 6 25
I I L U S F Y
C O M P R O M I S E D.
3 15 13 16 18 15 13 9 19 5 4.
5 19 14 1 11 5 19 14 1 11 5
8 8 1 17 3 20 6 23 20 16 9
H H A Q C T F W T P I.
Thus, "Elephant Lake has been compromised," turns into "XZFAMTBU WFDS IIX USFY HHAQCTFWTPI." Quite unintelligible and difficult to decipher.(48 votes)
- I believe "snake" should code to:
19 14 1 11 5
, i.e.: 14 not 13 for 'n'. I just discovered this while trying to implement a poly block cipher in ruby:shared_secret="snake"
m="Meet me at elephant lake Make sure nobody follows you Bring the bomb".downcase!.delete!(" ")
cipher=""
m.chars.each_with_index do |c, i|
shift = shared_secret[i % shared_secret.length].ord-96
shifted_char = c.ord+shift
shifted_char=(shifted_char % 122)+96
cipher << shifted_char.chr
end
p m
p cipher(17 votes)- Gotta remember that A = 0. Since that is the case, N does = 13. The alphabet ends up with 'Z' being 25.(1 vote)
- How does one determine the leak of information & length of the word?(6 votes)
- 1:29It is the differential in letter frequencies that is the leak. You can obtain this by counting how many times a number is used in the code. In a Ceasar cypher the frequency of each letter is used is it's fingerprint. @2:07it shows the fingerprint for English. We use some letter more frequently like 'E', 'A', and 'T'. 'E' in particular is used a lot more than the others. So when you see a letter in the code used a lot there is a good chance it is the letter 'E'. This gives you a place to start, and reduce the tedious task of trial-and-error.
@1:46For a polyalphabetic cypher Brit explains that the length of the word is the key in a cracking the code.
To find this you take letters at different intervals to build a subset of letters to analyze their frequency. i.e. start with the first letter then take every 3rd,4th, or 5th letter and build subsets. Analyze these subsets and look for the familiar fingerprint. If, for example, the length of the code word is 5 letters, the fingerprint should appear when you analyze every 5th letter (@2:00). If the word is 3 letters long the fingerprint should appear when you analyze every 3rd letter.
@2:08After you have determined the length of the word to be X letters long, you break up the code into subsets of every Xth letter (you should have X number of these subsets). You then need to perform a Ceasar cypher on each subset individually.(15 votes)
- Who is Brit Cruise?(4 votes)
- you can also catch a few more videos at his youtube channel: Art of the Problem
https://www.youtube.com/channel/UCotwjyJnb-4KW7bmsOoLfkg(3 votes)
- So, if you used a code word that is has 26 letters in it (the same number that is in the alphabet) would the code then be unbreakable?(3 votes)
- No, the code word must be as long as the message to be unbreakable.
The number of symbols you use in the alphabet does not matter as long as you choose from them randomly for each letter in your key (in computers we typically use only 2 symbols 1 and 0 for our "alphabet" in binary). If you choose an English phrase or some other non-random choice for the key, the code will be breakable,
For frequency analysis, if we know how long the key is we do the following:
-we draw some columns (# of columns = length of our key)
-we write the cipher text letter by letter in each of the columns
-once we go over the number of columns, we start a new row and start back at the first column and write the letters underneath the previous row
-once all the letters have been written in columns, we calculate how often each cipher text letter occurs in each column
-we expect that the cipher text letter will correspond to the plain text letter with a similar frequency in normal English usage
So if we had a key of length 26 and a message of length 2600, we would have 100 rows. Suppose we measured the frequency of the first column, and found that "q" occurred ~8% of the time and "u" occured ~13% of the time. In plain English we expect that the letter "a" occurs ~8.2% of the time and "e" occurs ~12.7% of the time. So we can figure out that the "q" probably represents an "a" and the "u" represents an "e", so the 1st letter in the key has shifted everything up "q" places, so the 1st key letter is a "q".
Now, if we tried the same thing with a key length of 2600 and a message length of 2600, we would only have one letter in each column (this will be true when the key length = message length). So if we try to calculate the frequency of each cipher text letter we get 100% for the letter in the column and 0% for all the other letters in the alphabet. This doesn't really tell us anything.
Hope this makes sense(7 votes)
- While it is true that for any given message a longer shift word will lead to a stronger code, wouldn't a more accurate way of expressing the strength of a cipher using this code be as a ratio between the length of the message and the length of the shift word? For example, A 40 character message with a four letter shift word should be about as difficult to break (10 instances of each Caesar cipher) as a 100 character message with a ten character shift word (10 instances again), right?(3 votes)
- No, it is indeed the length of the shift word and not the ratio of the shift word to message length that provides the strength. In the above video, Brit shows that when a 5-letter interval is guessed, the result gives the characteristic letter frequency. That means he had to try a 1-letter interval, a 2-letter interval, and so on up to the 5-letter interval. In your example, a four letter shift word only requires a search of 1, 2, 3, and 4-letter intervals before the correct interval is found, but a ten letter shift word requires a codebreaker to search up to the 10-letter interval before we can break the message. This is more work so the longer shift word provides more security.(7 votes)
- Eve is really determined to get this Bob guy.(5 votes)
Video transcript
A strong cipher is one which
disguises your fingerprint. To make a lighter
fingerprint is to flatten this distribution of
letter frequencies. By the mid-15th
century, we had advanced to polyalphabetic ciphers
to accomplish this. Imagine Alice and Bob
shared a secret shift word. First, Alice converts
the word into numbers according to the letter
position in the alphabet. Next, this sequence of numbers
is repeated along the message. Then each letter
in the message is encrypted by shifting according
to the number below it. Now she is using multiple
shifts instead of a single shift across the message, as
Caesar had done before. Then the encrypted message
is sent openly to Bob. Bob decrypts the message
by subtracting the shifts according to the secret
word he also has a copy of. Now imagine a code breaker, Eve,
intercepts a series of messages and calculates the
letter frequencies. She will find a flatter
distribution, or a lighter fingerprint. So how could she break this? Remember, code breakers
look for information leak, the same as finding a
partial fingerprint. Any time there is a differential
in letter frequencies, a leak of information occurs. This difference is
caused by repetition in the encrypted message. In this case, Alice's cipher
contains a repeating code word. To break the encryption,
Even would first need to determine the
length of this shift word used, not the word itself. She will need to go through
and check the frequency distribution of
different intervals. When she checks the
frequency distribution of every fifth letter, the
fingerprint will reveal itself. The problem now is to
break five Cesar Ciphers in a repeating sequence. Individually this is a trivial
task, as we have seen before. The added strength
of this cipher is the time taken to determine
the length of the shift word used. The longer the shift word,
the stronger the cipher.