Lexical Analysis Construction
- CaC, Ch. 3. Note that the details of JFlex differ somewhat from the description of Lex in 3.5.
- User manual for JFlex (available from the main page for this site): pay particular attention to section 3 ("A simple example").
- Stephen Edwards, Tiger Language Reference Manual (link)
The archive hw1.zip contains all the supporting files you need for this assignment. For instructions on installation, see "Setting Up the Project", below.
Over the rest of the semester, you will implement a compiler for the Tiger programming language. Tiger, developed by Prof. Andrew Appel, is a traditional imperative-style programming language, with a syntax similar to that of the ML family (though it lacks advanced features such as polymorphic type inference and first-class functions).
The language is described in Prof. Appel's series of texts, Modern Compiler Implementation in Java, Modern Compiler Implementation in C, and Modern Compiler Implementation in ML. A copy of the Java edition is on library reserve, if you are interested in details of the language. A short yet fairly complete guide to the language has been written by Prof. Stephen Edwards (see "Reading", above).
In this assignment, you will complete the skeleton of a JFlex specification of the lexical analysis for Tiger. To get you started, a skeleton distribution and supporting files are available:
Setting Up the Project
Make sure you have Eclipse and the JDK installed
This document assumes that you already have installed both the JDK (v1.6, minimum) and Eclipse. Install both of those, if you haven't already done so. Second, you are going to want to organize your work for this class in its own folder. Go ahead and make that folder now.
Set up the compiler project in Eclipse
- Download hw1.zip, and unzip it in a suitable folder. The result will be a folder, tigerc, containing a number of files and subdirectories:
tigerc --> doc -- tiger.pdf (Prof. Edwards' language guide) --> lib -- JFlex.jar (lexer generator for Java) -- java-cup-11a.jar (parser generator for Java) --> src --> tigerc --> codegen (unused, for now) --> intermediate (unused) --> semant (unused) --> syntax --> absyn (unused) --> parse -- Lexer.java (an interface used by the parser) -- Tiger.flex (The JFlex specification file) -- TigerSyms.java (symbol definitions for Tiger) --> util -- ErrorMsg.java (basic error reporting mechanism) --> test --> testcases (sample Tiger source files) -- comment_badNesting.tig -- merge.tig -- ... (dozens more) -- TestLexer.java (test driver for the lexer) -- build.xml
- In Eclipse, select File -> New -> Java Project
- The choice of project name is yours. Here, we'll use "TigerC".
- Uncheck "use default location", click "Browse", and navigate to the root of the folder containing the initial distribution, tigerc. Click "open". If you did it correctly, the "Location" bar should give the full directory path of the project, ending in the folder tigerc.
- Click Next. In the new dialogue box, under the Source tab, you should see the directory structure:
tigerc --> doc --> lib --> src --> test -- build.xml
Available Ant tasks
This project is organized through the Apache Ant software build system, which is installed with the Eclipse distribution. Though powerful, the learning curve for this system is extremely steep, so for the sake of your focus on this project, a complete build file, build.xml, is included.
Through the build file, Ant is able to handle crucial aspects of the construction of a major piece of software, including the management of class paths, integration with third-party libraries, and automation of common tasks.
To use the system, right-click on build.xml in the PackageExplorer, and select Run As --> Ant Build .... You will get a dialogue box listing a number of "targets" to execute. For this project, the important ones are:
- genLex — Runs JFlex on the file Tiger.flex, generating the source code file TigerLex.java.
- testLexer — Compiles tigerc.syntax.parse.TigerLex.java, tigerc.tests.TestLexer.java, and associated Java source files. It then runs the TestLexer executable, beginning with a prompt to enter the name of a Tiger source file for lexical analysis. (If you run this in Eclipse, you'll get a pop-up dialogue box.)
- clean — Deletes all generated source files (except TigerSyms.java), as well as all compiled binaries. It is sometimes useful to have a clean recompilation of everything, if you are worried that some binaries haven't been updated properly.
Complete the lexical specification of Tiger, as begun in the skeleton Tiger.flex file. The keywords and operators in the language are given in the reference document, and you will need to make sure that you have implemented recognition of all of them. In addition, pay attention to the following:
- Within string literals, escape sequences should be properly translated into their meanings.
- Be sure that unclosed comments and unclosed strings are detected at the end of file. Remember that comments can be nested.
- Be sure that all unclosed strings are detected at the end of a line, unless the \ ... \ line continuation form is used.
- Note that all integers are positive. This means, for example, that -5 should be detected as two separate tokens.
The source file test1.tig:
/* an array type /* and an array variable */ */ let type arrtype = array of int var arr1:arrtype := arrtype  of 0 in arr1 endOn my solution, this gives the stream of tokens
LET 54 TYPE 59 ID("arrtype") 65 EQ 73 ARRAY 75 OF 81 ID("int") 84 VAR 89 ID("arr1") 93 COLON 97 ID("arrtype") 98 ASSIGN 106 ID("arrtype") 109 LBRACK 117 INT(10) 118 RBRACK 120 OF 122 INT(0) 125 IN 127 ID("arr1") 131 END 136 EOF 140