P5: Interfaces and Problem Solving
(Encryption, Decryption, and Code Breaking)

Due Thursday, July 30, 11:59 PM via Checkin
Late deadline July 31, 12:00 PM
100 points

1. Objective

The goal of this assignment is to understand the concept of Java interfaces while implementing a classical substitution cipher. A cipher is used to encrypt and decrypt information. Encryption refers to the process or algorithm that converts plaintext into a coded or encrypted version, called cipher text. Decryption refers to the conversion of cipher text to plaintext. For details on what a cipher is, refer to this link on Wikipedia.

Because of time constraints, we will implement only one cipher called the Caesar Cipher. We will also implement a code breaker because sometimes there is a need to decrypt ciphertext, but you do not know the key used for encryption.


2. Description of Cipher Interface and Caesar Cipher

2.1. Cipher Interface

Create a Java interface called Cipher. It contains the following methods:
  1. public String encrypt(String plaintext); // Takes the String argument called plaintext and returns its encrypted version.

  2. public String decrypt(String ciphertext); // Takes the ciphertext, which is encrypted, and returns its decrypted version as a plaintext String.


2.2. CaesarCipher class

You will code the CaesarCipher class, which implements the Cipher interface using the Caesar cipher algorithm. You may refer to this link on Wikipedia for a detailed description. The Caesar cipher is one of the simplest classical encryption algorithms, and belongs to the category of substitution ciphers.

Each character in the plaintext is subsituted with a character coming a fixed number of places down the alphabet. For example, using a shift of 3 characters, we get the following table of character substitutions:

Original
character
Substituted
character
Original
character
Substituted
character
A D a d
B E b e
C F c f
... ... ... ...
Z C z c

Thus, a plaintext string "Cab" is encrypted as "Fde". A cipher text string "Fde" is decrypted as "Cab" by substituting each character with the character coming 3 position earlier in the alphabet. Also note that uppercaese letters are treated separately, i.e. Z is encrypted as C, and z is encrypted as c.

For this assignment assume that the plaintext to be encrypted can contain the letters A-Z, a-z, and spaces (but no tabs or other whitespace). No punctuation characters will be included in the text that needs to be encrypted. The ciphertext used as an argument to the decrypt method can only contain characters that correspond to valid substitutions of the characters specified for the plaintext.

Implement the CaesarCipher class as follows:

Constructor:

List of attributes:

List of Methods:


2.3. Sentry Class

The Sentry class is used to encrypt and decrypt text files using a given cipher. The Sentry class stores information on which cipher to use and can encrypt the contents of a text file.

Implement the Sentry class as follows:

Constructor:

List of attributes:

List of methods:

  1. public void encrypt(String inputFileName, String outputFileName): This method encrypts the contents of the file referred to by inputFileName, and produces an output file called outputFileName, which contains the encrypted text. Catch all file processing related exceptions, and print the error message and exit in the catch block as shown below:
    System.err.println(e);
    System.exit(0);
    

    The encrypt method in the Sentry class treats each line as a string and calls the encrypt method of the associated cipher instance variable. The encrypt method of Sentry uses recursion to process the lines from the input file in reverse order, and encrypts each line separately. The encrypted lines are printed in an output text file. The output file thus contains the encrypted version of each line in the input file, but the lines appear in the reverse order. Assume that the lines in the text file only contain characters as described above in the Caesar cipher algorithm above.

    For example, suppose that a Caesar cipher is used with a shift parameter of 3. Given this plaintext file, here is the resulting encrypted file. If you decrypt the encrypted file, you should get back the original input file.

  2. public void decrypt(String inputFileName, String outputFileName): This method decrypts the contents of the file referred to by inputFileName, and produces an output file called outputFileName, which contains the decrypted text. Catch all file processing related exceptions, and print the error message and exit in the catch block as shown below:
    System.err.println(e);
    System.exit(0);
    

    Decryption works in a corresponding way as encryption. It recursively reverses the order of lines in an encrypted file while decrypting each line separately, and finally produces a file containing the decrypted lines.


3. Description of code breaking

The Caesar cipher can be easily broken because there are only 26 possible shifts possible for the 26 letters in the English alphabet. Thus, a brute force approach can be applied by trying out all possible shifts. One of these shifts must be able to decrypt the text correctly.

While a human can easily detect the correct decryption of a ciphertext, it is also straightforward to design an algorithm that does that. Our approach is based on the observation that the letters in English tend to appear with certain characteristic frequencies. If one has access to a large piece of text, then one can train the program to learn the frequencies. A trained program can decrypt new text by determining which of the 26 shifts results in frequencies that best match the known frequencies.

There are two main steps used by our code breaker:

  1. Training: Given a "training" text, estimate the known letter frequencies.
  2. Decrypting: Using the knowledge of known frequencies, decrypt new ciphertext.


3.1 Training

Here is a sample plain text file that is "large enough" to give us a pretty good idea of the frequency of occurrence of each letter. Note that this file contains spaces and punctuation but you are only interested in determining the frequencies of the letters A-Z and a-z. The frequency is defined as the fraction of times each letter occurs in the text. That is, it is the number of times a particular letter occurs divided by the total number of letters in the text. Do not include non-letter characters in any of the counts. Furthermore, do not distinguish between uppercase and lowercase. That is, there is only one frequency for the letters 'A' and 'a' (not two), one for 'B' and 'b', and so on.

The training step produces an array of doubles. This array has 26 entries, each storing the frequency of a letter. This array is called knownFrequencies.

        public final int NUMBER_OF_LETTERS = 26;
        private double[] knownFrequencies = new double[NUMBER_OF_LETTERS];


3.2 Decrypting

The decryption algorithm needs to determine the value of the shift parameter that was used to produce the ciphertext. Once the shift is known, decryption can be performed using the Caesar cipher decryption technique.

In the training step we already determined the array of known frequencies. If you perform a similar frequency analysis of the ciphertext file, you will obtain an array of frequencies of letters in the ciphertext. Let us call this array observedFrequencies. Note that we cannot directly compare the frequencies because the observedFrequencies array is for letters in the ciphertext, while the knownFrequencies array is for letters in the plaintext. The plaintext letters were substituted to obtain the letters in the ciphertext.

Try out all possible shifts (0..25) of the observedFrequencies array using wraparound. That is, for shift of one A becomes B, B becomes C, ..., and Z becomes A. With shift of two, A becomes C, B becomes D, ... , Y becomes A, and Z becomes B. A shift of 0 corresponds to the original observedFrequencies array. For one of those shifts, the observedFrequencies will match the knownFrequencies best. To do this we need a measure that quantifies how well observedFrequencies match knownFrequencies. There exist several sophisticated distance measures (e.g., chi-squared statistic). However, in this assignment we will use a simple measure to calculate the distance between two frequency arrays, A and B.

Distance = ∑ (Math.abs(A[i] - B[i])),
where the summation is over i, which ranges over 0..NUMBER_OF_LETTERS-1.
Note that a small value of the distance implies a better match.


3.3. Tasks for Code Breaking

Download the class CodeBreaker from here. Implement the following methods in the class.

  1. public double[] getKnownFrequencies():

    This method is needed for testing to make sure you generated the correct array of known frequencies. It helps with grading and providing partial credit.

  2. public void setKnownFrequencies(double[] knownFrequencies):

    This method is needed when we test your code. We will need to bypass the training method while testing the decryption functionality just in case your training method isn't working correctly. It helps with grading and providing partial credit.

  3. public void train(String trainingFileName):

    This method implements the training step (section 3.1). It sets the array knownFrequencies.

    Assume the training file exists. For any exceptions that you need to catch, use the following text inside the catch block.

         System.err.println(e);
         System.exit(0);
    

  4. public int decrypt(String cipherTextFileName, String outputFileName):

    This method decrypts the contents of the file referred to by cipherTextFileName, and produces an output file called outputFileName, which contains the decrypted text. Note that the order of lines will be the same unlike in the case with the Sentry class that was also reversing the order of lines in a file. This method returns the value of the shift parameter that was used to encrypt the text.

    Assume the file to be decrypted exists. For any exceptions that you need to catch, use the following text inside the catch block.

         System.err.println(e);
         System.exit(0);
    


4. Preliminary and Final Testing

As always, remember that we will not test your main method. The methods you implement will be tested directly by the test server. Ideally, you should create your own main methods, perhaps one in each class, to test each class separately and together. Preliminary testing will cover some methods and a limited set of situations in these methods. Final testing may use other files (both names and contents may differ).

Here is a sample barebones main method for the CaesarCipher class. It does not test all the methods, nor does it test for various situations.

    public static void main(String[] args){
        CaesarCipher cipher = new CaesarCipher(3);
        String plainText = "The enemy was spotted";
        System.out.println("Plaintext: " + plainText);
        String cipherText = cipher.encrypt(plainText);
        System.out.println("After encryption, cipherText: " + cipherText);
        String backToPlainText = cipher.decrypt(cipherText);
        System.out.println("After decryption, plainText: " + backToPlainText);
        if(!plainText.equals(backToPlainText))
            System.err.println("Fix your program!");
    }

Here is a sample barebones main method for the Sentry class. It does not test all the methods, nor does it test for various situations.

    public static void main(String[] args) {
        Sentry sentry = new Sentry(new CaesarCipher(3));
        sentry.encrypt("encrypt-input.txt", "encrypt-output.txt");
        sentry.decrypt("encrypt-output.txt", "decrypt-output.txt");
    }

The CodeBreaker.java file contails a sample barebones main method to help you get started testing that program.


5. Submission

Create a single jar file called P5.jar from the four Java files (Cipher, CaesarCipher, Sentry, and CodeBreaker) using the instructions provided here. Do not add other files (e.g., class files).

Submit the P5.jar file via the online checkin system.