Cryptography (which literally means "hidden writing") refers to encoding messages so that they can, ideally, be read only by the intended recipients. A message that is to be encoded is called the plaintext while the encoded form of the message is the ciphertext. A cryptographic system consists of two algorithms, an encoding algorithm for converting plaintext messages into ciphertext and a decoding algorithm for converting ciphertext back into the original plaintext. Of course, there are always codebreakers around who might want to decode a message without knowing the decoding algorithm in advance. Different cryptographic systems make it more or less difficult for codebreakers to break the code. The more difficult it is, the more secure the cryptographic system.
Historically, cryptographic systems required a secret key, a piece of information shared by the sender and the intended recipient of the message. The secret key is used by both the encoding algorithm and the decoding algorithm. As long as the secret is known only to the sender and recipient, they are the only ones who can easily encode and decode messages. Of course, as the common wisdom goes, if something is known to two people, it's not really a secret. In the 1970s, public key cryptography was invented. In public key systems, the key that is used to encode messages is not a secret: anyone can know it and can use it to encode messages. A second private key is used for decoding messages. The private key is a true secret, known only to the recipient of the messages. Anyone can send a message to the recipient, and only the recipient can decode those messages (assuming that the code can't be broken).
This handout discusses several historical and modern cryptographic systems. But we begin with a bit of mathematics that is used in several of these systems.
Let n be any integer greater than one. If m is an integer, we use the notation m mod n to represent the remainder when m is divided by n. For example, 11 mod 3 = 2, 1234 mod 100 = 34, 65 mod 5 = 0, and 17 mod 2 = 1. Note that m mod n is one of the integers 0, 1, 2, ..., or n - 1.
To do modular arithmetic mod n, you just have to do regular arithmetic and then take the result "mod n"; that is, take the remainder when the result is divided by n. For example, "i +j mod n" means "take the sum of i and j, and then take the remainder when the answer is divided by n". You can do the same thing with subtraction, multiplication, and exponentiation. For example,
10 + 7 mod 12 = 17 mod 12 = 5 21 * 57 mod 100 = 1197 mod 100 = 97 54 mod 7 = 625 mod 7 = 2
You have to be careful when computing m mod n for a negative number m. For example, (-10) mod 7 is 4. The answer has to be a number in the range 0 through n - 1. In particular, (-m) mod nn is not equal to -(m mod n). To make the calculation easy, try adding n or a multiple of n to make the number positive: (-10) mod 7 = (14-10) mod 7 = 4 mod 7 = 4. Adding a multiple of n to m does not change the value of m mod n.
The idea of encoding messages is often attributed to Julius Caesar, who used a very simple system for encoding messages to his generals. His idea was to shift each letter in the plaintext by a specified number of places in the alphabet. For example, if the shift amount is 3, then A would be encoded as D, B would be encoded as E, and so on. If the shift moves a letter past the end of the alphabet, wrap around to the beginning, so that X is encoded as A, Y as B, and Z as C. To decode a message, it is only necessary to shift the letters of the ciphertext in the opposite direction. The secret key in this case is the shift amount, a number between 1 and 25. Of course, the Caesar Cipher is not at all secure, and it could only have worked if no one else understood that messages could be encoded at all.
Now, it is possible to describe the Caesar Cipher using modular arithmetic. First of all, we can associate each letter of the alphabet with a number between 0 and 25:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 |
Suppose that the shift amount is s. Then to encode an individual letter, find the number x associated with that letter, compute x + s mod 26, and then find the letter associated with the answer. For example, if the shift amount is 22 and the letter that is to be encoded is H: convert H to the number 7, compute 7 + 22 mod 26 = 3, and then convert the number 3 into the letter D. So, 'H' is encoded as 'D'. Similarly, to decode a letter, convert it into a number x, compute x - s mod 26, and convert the answer back to a letter.
We will use the same numbering of the letters of the alphabet in the next two codes. Note that when applying the Caesar Cipher and the next two codes, it is customary to discard all spaces and punctuation from the plaintext and to convert all the letters to upper case. It is assumed that the recipient of the message can figure out how to break the decoded message into words and sentences.
We can get a slightly more secure cryptographic system by using a different shift amount for different letters in the plaintext. In the Vigenere Cipher, invented in France in the 16-th century, the shift amounts are determined by a word (or a phrase, with spaces discarded). That word is the secret key.
By converting the letters in the secret word into numbers, the key can be considered to be a sequence of numbers, where each number is in the range 0 through 25. These numbers are the shift amounts that are going to be applied to the letters in the plaintext. The shift for the first letter in the plantext is given by the first number in the sequence (that is, by the first letter in the key); the shift for the second letter in the plaintext is given by the second number in the key; and so on. When the end of the key is reached, start over at the beginning. For example, suppose that the secret key is the word "FISH". This corresponds to the number sequence 5, 8, 18, 7. This means that the shifts applied to successive letters of the plaintext will be 5, 8, 18, 7, 5, 8, 18, 7, 5, 8, 18, 7, etc. If the plaintext is "MEETATNOON", it is encoded as follows:
Plaintext: M E E T A T N O O N Convert to numbers: 12 4 4 19 0 19 13 14 14 13 Shift amounts: 5 8 18 7 5 8 18 7 5 8 Add: 17 12 22 26 5 27 31 21 19 21 Modulo 26: 17 12 22 0 5 1 5 21 19 21 Convert to letters: R M W A F B F V T V
So, the ciphertext is RMWAFBFVTV. The decoding algorithm is the same, except that the shift amounts are subtracted instead of added.
Obviously, the security of the Vigenere Cipher increases as the length of the secret key word increases. In fact, the Vigenere Cipher is fairly easy to break if the key is short or is reused in many messages.
For a completely secure cryptographic system, you can turn to one-time pads. A one-time pad is a similar to a key for the Vigenere cipher and is used in the same way. That is, it is a list of shift amounts to be applied to different letters in the message. However, a one-time pad differs in the following ways: A one-time pad must be at least as long as the message. A one-time pad should be used for one message only, and then discarded. The list of numbers in the one-time pad should be random. If these conditions hold, then the one-time pad is as secure as a cryptographic system can be; that is, for someone who does not know the contents of the one-time pad, the ciphertext contains no information at all about the letters in the plaintext. Two people who want to exchange messages using this cryptographic system need a collection of one-time pads which they share and which nobody else has access to. The security depends, of course, on keeping the one-time pads secret.
As an approximation for a one-time pad, it's possible to take some sequence of characters from a well-known source. For example, you might agree that on the n-th day of the year, you will use letters from the n-th page of a certain book -- or maybe the n-th character from each page in that book. (Of course, this is not as secure as a true one-time pad.) For example, suppose we have agreed to use the first sentence of this handout as our one-time pad: "Cryptography (which literally means...". Then encoding the message SENDMONEYSOONEST would go like this:
One-time pad: C R P T O G R A P H Y W H I C H L I T ... Convert to numbers: 2 17 15 19 14 6 17 0 15 7 24 22 7 8 2 7 11 8 19 Plaintext: S E N D M O N E Y S O O N E S T Convert to numbers: 18 4 13 3 12 14 13 4 24 18 14 14 13 4 18 19 Add: 20 21 28 22 26 20 30 4 39 25 38 36 20 12 20 26 Modulo 26: 20 21 2 22 0 20 4 4 13 25 12 10 20 12 20 0 Convert to letters: U V C W A U E E N Z M K U M U A
And the ciphertext is UVCWAUEENZMKUMUA. Let's try decoding that, noting how negative numbers are handled:
One-time pad: C R P T O G R A P H Y W H I C H Convert to numbers: 2 17 15 19 14 6 17 0 15 7 24 22 7 8 2 7 Ciphertext: U V C W A U E E N Z M K U M U A Convert to numbers: 20 21 2 22 0 20 4 4 13 25 12 10 20 12 20 0 Subtract: 18 4 -13 3 -14 14 -13 4 -2 18 -12 -12 13 4 18 -7 Add 26 if negative: 18 4 13 3 12 14 13 4 24 18 14 14 13 4 18 19 Convert to letters: S E N D M O N E Y S O O N E S T
One problem with all the historical ciphers is that they depended on a shared secret. That is, there was some sort of secret key that had to be known to both the sender and the receiver of a message. In the 1970s, public key cryptography was invented. For public key cryptography, there is really a pair of keys, a public key that is published to make it known to anyone who cares to know it, and a private key which is kept secret by one individual. To send a secure message to that individual, the public key is used to encode the plaintext. The ciphertext is then transmitted to the receiver, who can use the private key to decode the message.
The most common form of public key cryptography is RSA, published in 1978 and named after its inventors, Rivest, Shamir and Adleman. In RSA, the public key consists of a pair of numbers (n,e) and the private key is a pair of numbers (n,d). The first number in each pair, n, is created by finding two large prime numbers and multiplying them together. Then the numbers e and d are selected to have certain properties (these properties make the encoding and decoding work properly). The mathematics is rather complicated, but the encoding and decoding algorithms are not.
To encode a plaintext message, the message is first converted into a sequence of numbers, where each number is in the range 0 to n-1. (The method for doing this is part of the RSA specification, but I don't know exactly how it's done.) Then each of those numbers is encoded separately. To encode the number m, the value
c = me mod n
is computed. The sequence of encoded numbers is the ciphertext message. To decode the ciphertext, the receiver uses the number d from the private key to convert each coded number c back to the original uncoded number m:
m = cd mod n
The plaintext can then be recovered from the sequence of numbers produced in this way. Note that RSA cryptography depends on the fact that n, d, and e can be selected so that for any number m in the range 0 through n-1,
(me)d mod n = m
The practicality of RSA also depends on the fact that there are efficient algorithms for selecting numbers n, d, and e that will work, as well as an efficient algorithm for computing ab mod n, even when a, b, and n are very large numbers. Recall that n is the product of two large prime numbers. The two prime factors are used in computing the value of d from a given value of e. Without knowledge of the two primes, it is very difficult to discover the value of d, even knowing the value of e. That is, knowing only the public key, it is very difficult to guess the private key. The security of RSA is based on the fact that given a number n that is the product of two large prime numbers, it is very difficult to find those primes. The larger the prime numbers, the harder it will be to factor n and the more secure the RSA algorithm will be.
One interesting property of the RSA algorithm is that it can be used in reverse: If the private key is used to encode a plaintext message, then the resulting ciphertext can be decoded using the public key. (Just look at the math -- it follows from the fact that (md)e mod n = med mod n = (me)d mod n = m.) But since everyone knows the public key, what would be the sense of that; anyone can decode the message!
In this case, however, the purpose is not secrecy. The message can be decoded by anyone, but it can only have come from one person. The fact that the message can be decoded using the public key means that the message can only have been encoded by someone who knows the corresponding private key. If I encode a message with my private key and send the encoded message to you, you will be able to decode the message with my public key. When you do that, you will know that the message came from me (or at least that it came from someone who knows my private key).
This is the basis for the idea of digital signatures. I can digitally sign a message by encoding it with my private key. The fact that the coded message can be decoded using my public key confirms that I am the author of the message.
Digital signatures can be combined with public-key encryption. Suppose that I want to send you a secret message, and I also want you to be sure that the message really came from me and not from someone else. I can do it if we both have public and private keys, and if each of us knows the public key of the other. To send you a digitally signed, encrypted message, I first encode my message using my private key, and then I encode the output of that process using using your public key. I send the resulting coded message to you. You first decode the message with your private key, and then decode the output of that process using your public key. If you get an intelligible message in the end, you know it came from me (since I am the only one who knows my private key), and you are the only one who can read the message (since you are the only one who knows your private key).