CS453 Colorado State University ============================================ Regular Expressions and Transition Diagrams ============================================ --------------- Logistics Tomorrow go to recitation 9-10 in 215 Office hours and lab hours Wednesday at 2pm Monday at 2pm Should be done and TO DO next slides --------------- Goals for today - typical compiler and interpreter structure - overview of MeggyJava and MiniSVG - regular expressions and DFAs - transition diagrams ---------------------------------------------- Overview of a typical compiler and interpreter [rects for input and output files circles for things executing ] (draw) source program --> compiler --> target executable (draw) input --> target executable --> output -> how does this work with Java? [ (draw) source prog --> javac --> .class files (bytecode) bytecode and input --> JVM or java, an interpreter --> output with JIT have just in time compilation in many JVMs ] -> specialized picture for MiniSVG interpreter (draw) file.svg --> MiniSVG interpreter --> text output and picture -> specialized picture for MeggyJava compiler (draw) file.java --> MeggyJava compiler --> file.s file.s and MeggyJrSimple.o --> avr-g++ --> file.hex file file.hex file --> avrdude --> MeggyJr device ------------------------ Typical Structure of a Compiler (slide 5) ------------------------ MeggyJava compiler structure (slide 6 and 7) -> show fixed flower example ------------------------ MiniSVG interpreter structure (slide 8 thru 10) -> show project writeup from webpage, indicate link to download start code and instructions for how to compile and run The hardest part of this assignment is to implement the lexer. -> show transition diagram or DFA Today will learn regular expressions, DFAs, and transition diagrams, which are a kind of DFA. Tuesday will learn how to use regular expressions and transition diagrams to write a lexer for MiniSVG. ------------------------ Languages (slides 13-14) A language is a set of strings. Examples: {aa, aaaa, aaaaaa, ... } all strings with an even number of a's {while} the while keyword {while, for, do} some loop keywords { <, >, ==, <>} relational operators {Meggy.Color.RED, Meggy.Color.BLUE, ...} colors for Meggy Empty string ------------------- Regular Expressions Example regular expression (slide 15) -> note that concatenation of (b)(c) has higher precedence than alternation between a and (b)(c) Specifying regular expressions (slide 16) primitive regular expressions: empty set, epsilon, and alpha where alpha is any character and epsilon is the empty string -> empty set versus a set with the empty string, recall the 15min compiler example. If language is an empty set, then even an empty file would result in a parse error. operations given regular expressions r1 and r2 alternation, r1 | r2 concatenation, (r1)(r2) or r1 r2 kleene closure, r1^* parentheses, (r1) Regular expression examples and Regular definitions (slide 17) Keywords for, if, while, ... operations <, >, +, -, &&, ... Identifiers here is where it helps to have regular definitions letter = A | B | ... | Z | a | b | c | .... | z = [A-Za-z] // equivalent notation, character classes digit = [0-9] [ id = (letter | "_") (letter | "_" | digit) ] Numbers We now know how to specify a regular language with regular expressions, but how do we determine if a string belongs to a specific language? ----------------------------------- Deterministic Finite Automata (slides 18-53) ------------------- Transition Diagrams (slides 54-63) (slide 57) -> Note that with current transition diagram we will recognize 3 abba tokens even though there is no space between the first two. (slide 59) -> to force a space after each abba token and put in a white space transition with an asterisk to indicate backing up the pointer when create lexeme TODO - post http://cs.oberlin.edu/~jdonalds/331/lecture04.html for extra reading about transition diagrams ------------------------ mstrout@cs.colostate.edu, 1/20/11