Decoding Cryptography: The Fall of Historical ciphers

Previously, on "The Love Letter Problem"…

In the last blog, we met Alice and Bob, watched our nosy postman fail to crack their symmetric encryption, and laid down the formal syntax (Gen, Enc, Dec), Kerckhoffs' Principle, attack models, and a question: what does "secure" actually mean?

We left off with the promise of math. This blog delivers on that promise by breaking things.

We're going to look at three historical ciphers that people actually used for centuries. Each one seems clever. Each one gets destroyed. And the way they get destroyed teaches us exactly why modern cryptography needs mathematical rigor.

Note: We'll be using some basic number theory (modular arithmetic) in this blog. Don't Panic (Arthur Dent). I'll introduce everything as we go.


Modular Arithmetic 101

Before we break ciphers, we need one tool: modular arithmetic. I will use the famous example of "clock math" to explain.

If it's 10 o'clock and you wait 5 hours, it's 3 o'clock — not 15 o'clock (unless you, like me, love 24hr format). You "wrapped around" at 12. Modular arithmetic formalizes this wraparound.

We write:

a mod n = the remainder when a is divided by n

Some examples:

  • 15 mod 12 = 3
  • 29 mod 26 = 3
  • 26 mod 26 = 0
  • 7 mod 26 = 7

The general idea is that when you reach the end, you loop back to the beginning. We'll be working mod 26 a lot, because the English alphabet has 26 letters.

Letters as Numbers

To do math on letters, we need to turn them into numbers. The convention is simple:

a b c d e f g h i j k l m n
0 1 2 3 4 5 6 7 8 9 10 11 12 13
o p q r s t u v w x y z
14 15 16 17 18 19 20 21 22 23 24 25

So "h" is 7, "t" is 19, and so on. Alice, Bob, the postman, all know this mapping. It's just a way to talk about letters using numbers.


Cipher #1: The Shift Cipher (Caesar Cipher)

As the name suggests, it is believed to have been used by Julius Caesar. I will be using both the names to get you desensitized. The idea:

Pick a random number k between 0 and 25. To encrypt, shift every letter forward by k positions. To decrypt, shift back by k positions.

Let's write this down formally using the Gen, Enc, Dec syntax from blog 1:

Gen(): Pick k ∈ {0, 1, 2, …, 25} uniformly at random

Enc(m, k): ∀ character mi in the message, compute ci = (mi + k) mod 26

Dec(c, k): ∀ character ci in the ciphertext, compute mi = (ci − k) mod 26

Simple Example

Suppose k = 3 and Alice wants to encrypt "hello".

Plaintext h e l l o
As number 7 4 11 11 14
+ k = 3 10 7 14 14 17
mod 26 10 7 14 14 17
Ciphertext k h o o r

Alice sends "khoor" to Bob. Bob, who knows k = 3, subtracts 3 from each letter and recovers "hello".

The postman sees this and scratches his head. For now.

Breaking the Caesar Cipher: Brute Force

Now let's think like the postman. He intercepts "khoor" and wants to recover the message.

He knows it's a shift cipher (thanks again, Kerckhoffs). He doesn't know k yet. But k can only be one of 26 values: 0, 1, 2, …, 25. So he just tries them all:

Candidate k Decryption of "khoor"
k = 0 khoor
k = 1 jgnnq
k = 2 ifmmp
k = 3 hello ← meaningful English!
k = 4 gdkkn
… (all gibberish)

26 attempts. A child could do this by hand. A computer does it in microseconds.

This is a brute force attack: exhaustive search over all k ∈ K. With |K| = 26, it's simple.


The Sufficient Key Space Principle

The shift cipher teaches us a fundamental lesson:

Sufficient Key Space Principle: Any secure cipher must have a key space large enough that a brute force attack is computationally infeasible.

This is a necessary condition, not sufficient. Large |K| doesn't guarantee security (as we'll see next), but small |K| guarantees insecurity. If the adversary can exhaustively search over all k ∈ K... you've already lost.

Let me clarify, how large is "large enough"?

To answer this, consider the power of modern computer. Supercomputers can't go more than 1020 (I am taking some liberty with the numbers), that is about 267 operations per second.

Thus, a key space of at least 280 is considered the bare minimum, and modern ciphers like AES-256 use a key space of 2256, a number so large that if every atom in the observable universe were a computer trying a billion keys per second, they still wouldn't finish before the heat death of the universe.

A key space of 26 becomes a joke in comparison.


Cipher #2: The Mono-Alphabetic Substitution Cipher

Alright, so the postman broke the shift cipher because there were only 26 keys. What if we give him a lot more keys?

The idea: instead of shifting every letter by the same amount, we create a a permutation (which is a special kind of bijection, else Dec won't be deterministic ^^ ) from plaintext letters to ciphertext letters. Each letter maps to a different, unique letter.

For example, a possible key might be:

Plaintext a b c d e f z
Ciphertext q w z r x b m

Under this key, every "a" in the plaintext becomes "q", every "c" becomes "z", and so on. The receiver, who knows the same permutation, reverses it.

Formally:

Gen(): Pick a random permutation σ : {0,…,25} → {0,…,25}

Enc(m, σ): ∀ mi, compute ci = σ(mi)

Dec(c, σ): ∀ ci, compute mi = σ−1(ci)

where σ−1 is the inverse permutation (the mapping in reverse).

The Key Space: 26!

How many possible keys are there? A key is a permutation of 26 letters, and the number of permutations of 26 items is:

26! = 26 × 25 × 24 × … × 1 ≈ 4 × 1026 ≈ 288

That's roughly 288. An astronomically larger key space than the shift cipher's 26 keys. A brute force attack would need to try about 288 permutations. Even with modern hardware, this is infeasible.

So we're safe, right? The sufficient key space principle is satisfied.

Not so fast.


Breaking the Substitution Cipher: Frequency Analysis

Remember that the sufficient key space principle is a necessary condition, not a sufficient one. A large key space protects against brute force, but who said brute force is the only attack?

The postman is evolving. For the shift cipher, brute strength was enough. Now he puts on his statistician hat (!!!). He knows something about English that no permutation can hide.

The Fingerprint of Language

Every natural language has statistical patterns. In English:

  • The letter "e" appears roughly 12.7% of the time
  • "t" appears about 9.1% of the time
  • "z" barely shows up at 0.07%

Here are the approximate frequencies of all English letters:

Letter Frequency Letter Frequency
e 12.7% s 6.3%
t 9.1% h 6.1%
a 8.2% r 6.0%
o 7.5% d 4.3%
i 7.0% l 4.0%
n 6.7%

And it goes beyond single letters. Common bigrams (pairs of letters) like "th", "he", "in" appear far more often than "qz" or "jx". Common trigrams like "the", "and", "ing" are also well-documented.

This is all public data. Anyone can compute these statistics from a large English corpus.

The Attack

By now, you must've enough hints to understand how to break the cipher. I will still mention the attack method for completeness.

  1. Frequency map of every letter in ciphertext.
  2. Compare it with the known English frequency chart.
  3. Play Match the following.
  4. Refine using bigram and trigram frequencies until the permutation is recovered.

For a long enough ciphertext, this works beautifully. No brute force needed. The permutation, all 26! possibilities, is cracked by exploiting the statistics of the English language.

The lesson: a large key space is necessary but not sufficient. The substitution cipher has 288 keys but falls to a simple statistical attack.


Cipher #3: The Vigenère Cipher

So the substitution cipher was broken because every "e" always maps to the same ciphertext letter. Alice and Bob are smart, but very stubborn. They still want to use shift and keep the ciphers somewhat simple.

Vigenère cipher (we've already heard of this. Also called the poly-alphabetic substitution cipher) enters the stage!! Instead of one shift applied uniformly, it uses a keyword of length t, and each letter of the keyword specifies a different shift.

Formally:

Gen(): Pick a random length t ≥ 1, then pick a random string k = (k1, k2, …, kt) where each kj ∈ {0, 1, …, 25}

Enc(m, k): Divide the message into blocks of length t. For each character mij (the j-th character of the i-th block), compute: cij = (mij + kj) mod 26

Dec(c, k): For each character cij, compute: mij = (cij − kj) mod 26

Notice the subscript on k: it's kj, not just k. The shift changes depending on the position within the block.

Example

Let the keyword be "cipher" (length t = 6). Converting to numbers: c=2, i=8, p=15, h=7, e=4, r=17.

Alice wants to encrypt: "thiscryptosystemisnotsecure"

Step 1 — convert plaintext to numbers:

t  h  i  s  c  r  y  p  t  o  s  y  s  t  e  m  i  s  n  o  t  s  e  c  u  r  e
19 7  8  18 2  17 24 15 19 14 18 24 18 19 4  12 8  18 13 14 19 18 4  2  20 17 4

Step 2 — divide into blocks of 6 and add the key (mod 26):

Block 1:  19  7  8 18  2 17     Key:  2  8 15  7  4 17
          ─────────────────          ─────────────────
Sum:      21 15 23 25  6  8  (mod 26)
Letters:   v  p  x  z  g  i
Block 2:  24 15 19 14 18 24     Key:  2  8 15  7  4 17
          ─────────────────          ─────────────────
Sum:       0 23  8 21 22 15  (mod 26)
Letters:   a  x  i  v  w  p
Block 3:  18 19  4 12  8 18     Key:  2  8 15  7  4 17
          ─────────────────          ─────────────────
Sum:      20  1 19 19 12  9  (mod 26)
Letters:   u  b  t  t  m  j

And so on for the remaining characters.

Clever?

We look at every instance of the letter "s" (= 18) in the plaintext:

  • Position 4 (in block 1): shifted by k4 = 7 → ciphertext "z"
  • Position 5 (in block 2, as 'position 5'): shifted by k5 = 4 → ciphertext "w"
  • Position 1 (in block 3): shifted by k1 = 2 → ciphertext "u"

The same plaintext letter "s" produces three different ciphertext letters: z, w, u. The one-to-one relationship between plaintext and ciphertext letters and the very thing that made frequency analysis work, is broken.

The key space is still sufficiently large. The postman prepares his frequency chart and finds… no clear pattern. The most frequent ciphertext letter doesn't cleanly correspond to "e" anymore. Victory for Alice and Bob? Our Cryptography journey ends here?

Ofc not :P.


Not Clever!

The Vigenère cipher resisted cryptanalysis for centuries and was even called "le chiffre indéchiffrable" (the indecipherable cipher). But our postman has evolved again... a pattern hunter! And the Vigenère falls to an elegant two-staged attack.

Stage 1: Find the Key Length t

Several methods exist to determine t, including Kasiski examination and the index of coincidence. The core idea behind both: if the key has period t, then the ciphertext has a subtle periodic structure that can be statistically detected.

I'll leave the details of Stage 1 as a gift for you, my friend, the Wikipedia articles are excellent. The important thing is: given enough ciphertext, the key length t can be determined.

Stage 2: Recover the Key (t Independent Frequency Analyses)

Once you know t, the rest is a familiar trick. Here's the insight:

Consider every t-th character of the ciphertext, starting from position 1:

c1, c1+t, c1+2t, c1+3t, …

All of these characters were encrypted with the same shift k1. So this subsequence is just a Caesar cipher!

Now do the same for position 2:

c2, c2+t, c2+2t, c2+3t, …

This stream was all shifted by k2. Another independent shift cipher to crack.

Repeat for positions 3, 4, …, t. We have t independent shift ciphers and we can use frequency analysis for each of them, each recovering one character of the key. Once we have all t key characters, we have the full key.

For this to work, we need the ciphertext to be long enough that each of the t streams has sufficient statistical signal. But for any fixed key length t, a sufficiently long message will always be breakable.

The Vigenère cipher, the "indecipherable cipher", broken in ciphertext-only attack. Not a great look :<


The Scoreboard

Let's step back and take stock. Three ciphers, three attacks, all in the weakest possible attack model (COA):

Cipher Key Space Broken By Attack Model
Shift Cipher 26 Brute Force COA
Mono-Alphabetic Substitution 26! ≈ 288 Frequency Analysis COA
Vigenère Cipher 26t (periodic) Kasiski + Frequency Analysis COA

Remember that COA is the weakest adversary. The postman has only the ciphertext. Imagine how much worse these ciphers would fare against KPA or CPA adversaries, who have plaintext-ciphertext pairs or even an encryption oracle.


Enter Cryptography

A pattern that repeated for centuries:

  1. Someone proposes a "clever" cipher.
  2. Everyone thinks it's secure.
  3. Someone finds an attack.
  4. Repeat.

The fundamental problem with classical cryptography was that it was an art, not a science:

  • No formal definitions of what "secure" means
  • No stated assumptions about what the attacker can or cannot do
  • No mathematical proofs that the cipher actually achieves its claimed security

Sound familiar? I walked through five attempts at defining "secure" and showed how each one failed. That's exactly the kind of imprecision that plagued classical cryptography for millennia.


The Three Principles of Modern Cryptography

Modern cryptography fixes this by resting on three pillars:

Principle 1: Formal Security Definitions

Don't just say "this cipher is secure." Define exactly what "secure" means in mathematical terms. Against what kind of adversary? With what computational resources? What exactly should the adversary be unable to compute?

We saw this begin with our series of attempts at defining privacy, and with the attack model hierarchy (COA → KPA → CPA → CCA). We'll arrive at precise, mathematical definitions that leave no room for hand-waving.

Principle 2: Precise Statement of Assumptions

Every construction in modern cryptography is built on some underlying hardness assumption, i.e, a mathematical problem believed to be computationally hard. For example:

  • "Factoring large integers is hard" (used in RSA)
  • "Computing discrete logarithms is hard" (used in Diffie-Hellman)

These assumptions (axioms) are unproven. We don't have mathematical proofs that these problems are truly hard. But by stating them explicitly, we achieve two things:

  1. We can compare different constructions by comparing their assumptions.
  2. If a construction is broken, we know exactly which assumption failed.

This is infinitely better than the classical approach of hoping no one finds an attack.

Principle 3: Rigorous Mathematical Proofs

Once we have a definition, Z (Principle 1) and an assumption, X (Principle 2), we mathematically prove that the construction, Y, satisfies the definition, assuming the underlying hardness assumption holds.

The structure is always:

"If X, then Y is Z-secure."


From Intuition to Formal Security

The historical ciphers all fail because ∃ an efficient adversary A that can recover m (or compute some function of m) from c, even in the weakest (COA) attack model.

This suggests a cleaner requirement. What if we demanded:

∀ m0, m1 ∈ M, ∀ c ∈ C: Pr[C = c | M = m0] = Pr[C = c | M = m1]

In other words: the ciphertext distribution should be completely independent of which message was encrypted. The adversary, no matter how powerful, stares at c and learns nothing about whether m0 or m1 was the plaintext.


What's Next?

Alice and Bob had all their ciphers broken. Wah wah wah :<

But the next blog will answer the question we've been teasing:

How do you mathematically formalize whether the ciphertext "helped" the attacker?

That leads us to probability theory, random variables, and the first formal notion of security in cryptographic history. One so strong that it's provably unbreakable, even against an adversary with unlimited computing power. It was defined by Claude Shannon in 1949. It's called perfect secrecy.

But fair warning: more math incoming. You've been warned!


Acknowledgment

The technical content in this blog draws from "Introduction to Modern Cryptography (Second Edition)" by Jonathan Katz and Yehuda Lindell, and from Professor Ashish Choudhury's lecture series on Foundations of Cryptography.


So the next time someone tells you they've invented an "unbreakable" cipher, ask them three questions: What's your formal security definition? What's your hardness assumption? Where's your proof? If they can't answer all three... well, you've read this blog. You know how that story ends.


Note: The mathematical formulations in this blog are simplified for readability. The worked examples are exact.

Note: The diagrams in this blog are AI-generated and meant to be illustrative, not precise technical specifications.