Program Debugging Tutorial

Objectives
  • Learn how to debug Java programs, including:

    • Locating and diagnosing exceptions.

    • Initialization and management of data structures.

    • Tracing code to find logical errors.

Getting Started
  1. Create a new Java project called HTMLExtractor and download HTMLExtractor-starter.jar.

  1. Import the jar into your source directory by right clicking on the project > import > General > Archive File, then browsing to the location of the downloaded jar file, and clicking finish. Your TA will go over this process. Your eclipse directory should look like this:

HTMLExtractor/
└── src
    └── Extractor.java
  1. Setup your command line arguments in Eclipse, as follows:

https://www.cs.colostate.edu/~cs165/CurrentSemester/home_programs.php links.html
Description

This lab introduces the fact that a Scanner can be used to read a web page instead of reading from the keyboard (System.in) or a File.

Read the javadoc for the following methods in the Extractor class:

  1. readHtml

  2. extractHtml

  3. writeHtml

You will be debugging this class.

The image below shows the browser view of the original HTML page that the program reads and parses:

Input HTML

The next image shows the browser view for the HTML page that the program generates. This page is a legal HTML page that you can view in the browser and contains functional links that you can follow to view the Liang sample programs:

Output HTML
Debugging

Make a comment above each bug explaining what is wrong with the code.

  1. Run the program. The main method will get an exception related to command line arguments. Your recitation leader will discuss how to find the location in your code where the exception is occurring, and how to print information about the command line arguments to see what is happening.

  2. After fixing the first defect, the readHtml code will exit due to an exception. Your recitation leader will discuss how to add code to print the message associated with the exception to help you find and fix the problem. Print out the html ArrayList to make sure it contains the contents of the HTML page.

  3. After fixing the second defect, the extractHtml code will exit prematurely due to a null pointer exception. By now you should be able to figure out the exact line where the problem is occurring, and diagnose it to allow you to fix it.

  4. After fixing the third defect, the writeHtml method should execute. If it is not writing the entire file, check and see why the file is truncated. This is the fourth defect. Be sure to refresh the project directory after running by right clicking on the project > Refresh. This will reveal the newly generated file.

  5. Next, check the contents of the HTMl file generated by the program. The output should create a web page with the correct name of each program, but it instead appears to be adding extra characters. This is the fifth defect.

  6. The web page now looks correct, but the links do not work because the HTML anchor keywords are not generated correctly. You might need to look at an example of an HTML link (anchor) for comparison. By fixing this sixth defect, you will get an output HTML file that you can browse, with working links.

Note
The link reference should end with ".html" but the link name should not. The best way to do this is to remove ".html" in extractHtml and add it back in writeHtml, in fact you have to do this to pass the test program.

Important
Show the TA or helper your program with all bugs fixed to recieve credit.