One of the finest universities north of Prospect in Fort Collins

Jack Applin

PmWiki

Grading Scripts

Purpose

To grade a homework assignment, you write a grading script, a file with the suffix .gs. It unpacks, builds, executes the student code, and evaluates its output.

Example

To test this simple C++ program, hello.cc:

#include <iostream>

int main() {
    int useless;  // will cause a problem
    std::cout << "Hello, world!\n";
}

This script, hello.gs, might be used:

#! /s/parsons/d/fac/applin/bin/grader

# Rubric (5-point assignment):
#
# • 1.0 compiles
# • 1.0 no warnings
# • 2.0 produces correct output
# • 0.5 produces no standard error output
# • 0.5 no global variables

src=${1:?first argument must be source file}

setting MaxScore 5          # Value of this assignment

rm -f a.out                 # In case left over from previous run
export LANG=C               # Vanilla compiler output, please.
run g++ -Wall $src
test 1.0 "compiles" [[ -x a.out ]]

if ((score<=0)) return      # Give up if we couldn’t even build.

test 1.0 "no warnings" ! grep -qi warning stdout stderr

run ./a.out

test 2.0 "correct output" exact 'Hello, world!\n' stdout
test 0.5 "stderr is empty" empty stderr
globals 0.5 a.out

When executed like this:

    ./hello.gs hello.cc

It would produce this output, which contains three sections:

  1. The overall score and who did the grading (so that students know who to complain to).
  2. The high-level summary, one line per test.
  3. The details of individual tests.

Most students will read only the first line. Some students will read the summary (one line/test) section. A few students will actually look up the failing tests from the summary section in the details section.

Score: 4.00/5.00 points
Graded by Jack Applin <applin@CS.ColoState.Edu>

Summary of all tests:
  Value  Result  Test  Description
   1.00  pass       1  compiles
   1.00  FAIL       2  no warnings
   2.00  pass       3  correct output
   0.50  pass       4  stderr is empty
   0.50  pass       5  globals
   4.00                Total
Passed 4 tests, failed 1 test.

Details of individual tests:

Executing: g++ -Wall hello.cc
Exit code: 0
Standard output is empty
Standard error (4 lines):
        hello.cc: In function 'int main()':
        hello.cc:4:9: warning: unused variable 'useless' [-Wunused-variable]
            4 |     int useless;
              |         ^~~~~~~

Test 1: compiles
Status: pass
Condition: '[[' -x a.out ']]'
Value: 1.00

Test 2: no warnings
Status: FAIL
Condition: ! grep -qi warning stdout stderr
Value: 1.00

Executing: ./a.out
Exit code: 0
Standard output (1 line):
        Hello, world!
Standard error is empty

Test 3: correct output
Status: pass
Condition: exact 'Hello, world!\n' stdout
Value: 2.00

Test 4: stderr is empty
Status: pass
Condition: empty stderr
Value: 0.50

Test 5: globals
Status: pass
Condition: No globals used
Value: 0.50

Overview

The most important functions of the grading script are run and test.

run
This doesn’t do any testing; it just runs a command, perhaps with options or redirected input, and saves the output.
test
Take the results of the previous run and assert some condition about them. If the condition succeeds, great! Otherwise, the test has failed.

Up/Down

A grading script can either count up or down.

Counting up
The score is initially zero. A passing test adds its value to the score. A failing test does not change the current score. Start a script that counts up like this:
        setting MaxScore 10	# ten-point assignment, count up
Counting down
The score is initially a non-zero value, the number of points for the assignment. A failing test subtracts its value from the score. A passing test does not change the current score.
        let score=5.0	    # five-point assignment

The Grading Script

The grading script is a Bash script, with additional functions defined. (Really, it’s a zsh script, but treat it like a Bash script.) Here are the functions:

setting name value
Set a parameter. These are the default settings:
setting MinScore 0.0minimum possible score
setting MaxScore 0.0maximum possible score
setting Visible truemake control chars visible in report
setting ExpandTabs trueexpand tabs in stdout & stderr
setting TrimCR trueremove DOS carriage returns
setting TrimWhitespace trueremove trailing horizontal whitespace
setting TrimTrailingBlankLines true remove trailing blank lines
setting Merge falsemerge stderr into stdout on output
setting Flatten trueflatten unpacked archives
setting StdinTermNull trueroute terminal stdin from /dev/null
setting ShowLines 10show this many lines in report
setting MaxFileSize 1000000maximum file size in bytes (1MB)
setting TimeLimit 30runtime CPU limit (30 seconds)
setting Header ''show this at the start
setting Footer ''show this at the end
unpack [ -C directory ] archive
Unpack the tar or zip archive into the optional directory (or the current directory if -C not specified). If the setting Flatten true is in effect, which it is by default, then any prefix directories are removed when unpacking. Examples:
        unpack -C Build hw4.tar
        unpack foo.zip
test value title condition
Take the results of the previous run and assert some condition about them. If the test succeeds (and we are counting up), add the value to the current score. If the test fails (and we are counting down), subtract the value from the current score.
The condition can be any of:
  • [[ condition ]]
  • (( arithmetic-condition ))
  • any command, e.g., grep. A zero exit code indicates success. The command should produce no output—just an exit code. You can use >&/dev/null to suppress output.
In bash & zsh, a leading ! negates the exit code of a command. It reverses the sense of the test. grep -q "foo" stdout succeeds if foo is in the output. ! grep -q "foo" stdout succeeds if foo is not in the output.
The title names the test, and is shown in the summary & details sections of the report. Provide a positive title—describe the good condition (e.g., "prompt found") as opposed to the bad condition (e.g., "no prompt found").
Examples:
        test 1.0 "foo is executable" [[ -x foo ]]
        test 1.0 "output contains foo" grep -q foo stdout
        test 0.5 "no warnings" ! grep -qi warning stdout stderr
run [ -C directory ] command arg arg
In the optional directory (or the current directory if -C not specified), execute command arg arg …, sending standard output to the file stdout, and standard error to the file stderr. If setting Merge true is in effect (false by default), then send both streadms to stdout.
The exit code, and contents of stdout and stderr, will be displayed in the detailed results section of the report.
empty file
Assert that all the given files are empty. Note that stdout and stderr may have been filtered by the settings TrimWhitespace and TrimTrailingBlankLines. Since these are enabled by default, spaces at the end of lines and empty lines at the end of the output have already been removed. Since such trailing whitespace is largely invisible, these settings save a lot of arguments.
Examples:
        test 0.5 "stderr is empty" empty stderr
exact string file
Assert that a file contains exactly these contents. Backslash escapes (\n, \t, \0octal, \xhex, \Uunicode, …) are recognized, but no pattern matching. This is not a regular expression. Examples:
        test 1.0 exact "Hello, world!\n" stdout
        test 1.2 exact "2+2\n4\n" outfile
If you want pattern matching, use grep with -q to suppress the output:
        test 1.0 grep -q "foo.*bar" stderr
badsyms value executable title sym1 sym2
Forbid certain symbols (functions). If the given symbols are used in the executable, then report the offending symbols. Example:
        badsyms 1.0 a.out "No C I/O" printf fprintf scanf fscanf
globals value executable exceptions
Forbid global variables, with exceptions. Examples:
        globals 0.10 Build/a.out	 # no global variables permitted
        globals 0.25 hello program_name	 # allow program_name, but no others
pity value title
If the current score is less than value, set the current score to value. Do this after all the tests, at the end of the grading script.
        pity 0.2 "Minimum score for compilation"
score
It’s a variable, not a function. It contains the current score, as a floating-point number. Most commonly used like this:
        Unpack the archive
        Compile the code
        if ((score<=0)) return	    # give up if unpack/compile failed