CS 270 PA9: Lexical Analysis


due Sunday, April 26th at 10:00pm, no late submissions.

Goals

For this assignment, you will implement a reduced version of lexical analysis (tokenization) for LC-3 assembly.

This code will certainly not handle all valid LC-3 assembly programs. In addition, it will not detect all invalid LC-3 assembly programs. All that it does is lexical analysis, or tokenization.

You will implement the functions scan_token and free_token, as described in token.h. Put them in the file token.c, and turn it in.

Required Functions

struct Token *scan_token(FILE *);
Scan the next token from the given input stream. Return NULL at end of file. Otherwise, allocate a struct Token, fill it in, and return a pointer to it.
void free_token(struct Token *);
Free all the memory allocated by scan_token().

Test code

Here is some test code:
#include "token.h"
#include <stdio.h>

int main(int argc, char *argv[]) {
    FILE *f = fopen("test.asm", "r");
    if (f==NULL) {
	fprintf(stderr, "%s: can’t open test.asm.\n", argv[0]);
	return 1;
    }

    struct Token *t;
    while ((t=scan_token(f)) != NULL) {
	printf("Line %d: ", t->line_number);
	switch (t->type) {
	    case TOKEN_INVALID:	   printf("invalid \"%s\"", t->str);	break;
	    case TOKEN_LINE_LABEL: printf("line label \"%s\"", t->str);	break;
	    case TOKEN_OPCODE:	   printf("opcode %d", t->opcode);	break;
	    case TOKEN_COMMA:	   printf("comma");			break;
	    case TOKEN_INT:	   printf("integer %d", t->val);	break;
	    case TOKEN_REGISTER:   printf("register R%d", t->val);	break;
	    case TOKEN_LABEL:	   printf("label \"%s\"", t->str);	break;
	    default:		   printf("Huh?");			break;
	}
	printf("\n");
	free_token(t);
    }

    fclose(f);
    return 0;
}
When run with this wretched assembly source file:
; Test file

    ADD r2 , x15	    ; R2, how strange!
    BRz AWAY
r8this	and #-12
it should produce this output:
Line 3: opcode 0
Line 3: register R2
Line 3: comma
Line 3: integer 21
Line 4: opcode 3
Line 4: label "AWAY"
Line 5: line label "r8this"
Line 5: opcode 1
Line 5: integer -12

Requirements

  1. Opcodes and registers are case-independent. ADD, add, and Add must all work.
  2. “R2D2” is a label.
  3. “x5000fighterjet” is a label.
  4. “123xyz” is either an integer followed by a label, or invalid.
  5. Anything put into Token.str must retain its original case for error messages.
  6. You may write whatever other functions that you find useful, but they must be static.
  7. You may not change token.h. We will use our own, unchanged, token.h, when testing your code.
  8. ; comments are completely ignored.
  9. Tabs ('\t') are treated the same as spaces.
  10. Lines with nothing but spaces & comments are ignored (but count in line_number).
  11. Token.line_number and Token.type are valid for all tokens. The other fields are only valid for certain tokens.
  12. Token.str, when used, must be dynamically allocated using malloc or a similar function.

Submission Instructions

Submit token.c to the Checkin tab.