CS 200: P5: Hashing

Purpose

The object of this assignment is to gain experience in hashing.

ICD-10

Medical people use ICD-10 codes to document illnesses and injuries. For example:

File Format

A sample file is available: icd10cm_order_2016.txt

Lines in the file are of variable length, but never longer than 400 characters. Fields are defined as follows, where the first column is column 1:

Column Length Contents
1 5 Order number
7 7 Code
15 1 0/1 for non-header/header
17 60 Short description
78 remaining Long description

You may assume that lines in the data file are in that format.

Arguments

This program will take the following arguments: Each query corresponds to a “Code” from the above table. For each query, the program should print the corresponding long description, or complain if the code was not found.

Beware of confusion between “Code”, in terms of a field in the ICD-10 data file, and hash codes.

Implementation

You may not simply read the file multiple times, looking for the desired queries. Instead, you must, intitially, read the entire file into a hash table. All queries must use that hash table, not the file.

Hash Table Format

Hash Code

To hash a query string, use this CRC32 algorithm:
    import java.util.zip.CRC32;
    import java.util.zip.Checksum;

    // Compute CRC-32 checksum
    public static long cksum(String s) {
        Checksum engine = new CRC32();
        byte[] bytes = s.getBytes();
        engine.update(bytes, 0, bytes.length);
        return engine.getValue();
    }
This will give you a long value 0‥4294967295. You will have to process this further to get an int suitable for indexing into your hash table.

Sample Run

Here is a sample run, where “%” is my shell prompt:
    % javac icd.java
    % java icd icd10cm_order_2016.txt 10000 B2681 V00112D Zulu W6161
    B2681: Mumps hepatitis
    V00112D: In-line roller-skater colliding with stationary object, subsequent encounter
    Zulu: code not found
    W6161: Bitten by duck

Requirements

Submitting your code

Use the Checkin webserver to exercise and submit your code. Submit a P5.jar file containing your *.java files.

Do not put .class files in P5.jar.
Do not put .class files in P5.jar.
Do not put .class files in P5.jar.
Do not put .class files in P5.jar.
Do not put .class files in P5.jar.