ParsDemo Implementation Notes

Last Update Time-stamp: "97/06/30 14:30:50 umrigar"

This note documents the overall design of the ParsDemo program. It assumes that the reader is familiar with the operation of the program.

Most of the classes which constitute the program are individually trivial. Some of them have javadoc style documentation, but I did not use that style of documentation much as I found it too bureaucratic and more suited for documenting the external interfaces of packages. This note documents how all the classes fit together.

Initially, an attempt was made to separate the GUI aspects of the program from the parsing aspects of the program. Though this attempt was not entirely successful, the classes can be separated into 4 broad categories:

Container classes These may be useful outside this package. However, they would need more work to be made usable in a general package.
Domain classes: Classes having knowledge of the domain being visualized in this program. Classes in this category implement grammars, scanners and parsers.
GUI classes: Classes in this category implement several custom canvases for displaying the state of the parse.
Interface classes: Used to exchange information between the domain classes and the top-level classes.
Top-level classes: They set up the GUI, start the parsers, and take care of synchronizing the parser operation with the GUI display.

Utility Classes

These are general container classes which may be useful outside this package.

Tree supports general n-ary trees. Each tree node can hold an arbitrary information Object. The implementation of a tree-node uses two object references: a kids reference referencing the leftmost kid of a node and a sibs reference referencing the next sibling of the node. The sibs reference chain is linked through the parent to form a circular list (with a flag used to establish when the reference goes up from sibling to parent). These references make arbitrary tree navigation possible, though care has to be taken when a subtree is added/removed from a node. In particular, care should be taken that a subtree is never simultaneously linked into more than one tree, else strange things will happen. The class can probably use some debugging code to check for such anomalous situations.

Currently, other classes are derived from Tree. It is probably better to merely treat Tree as a general hierarchical Object container.

Table2 is an abstract class supporting 2-dimensional tables. A row/column index can be any arbitrary Object, as can the contents of each cell. The table is accessed and updated using get() and put() methods respectively. Since the table is expected to be sparse, the SparseTable2 implementation uses a hashtable indexed by a auxiliary RowCol class. A blank entry is represented using a null Object.

Domain Classes

A grammar is represented using the abstract Grammar class. This class must be subclassed for a concrete grammar. Grammar symbols (GramSym) and rules are maintained by the Terminal, NonTerm and Rule classes. Conceptually, these classes are inner classes (in the sense of Java 1.1) of the Grammar class. The Grammar class exports special grammar symbols like EOF which every grammar should have. It also has special protected interfaces for its subclasses to create the grammar symbols and rules.

Rule RHS's are stored within an array within the Grammar class. Hence an item (in the LR-item sense) is merely an index into this array.

The example program contains two concrete grammars which are instantiations of the Grammar class: SimpCompSRGram and SimpCompLL1Gram (the latter is merely the former with left-recursion removed). Each concrete grammar exports its terminals and nonterminals to the rest of the package.

The general interface of a scanner is encapsulated via the abstract class Scanner which is setup to deliver Token's. This Scanner class defines tokens which every scanner should have like EOF_TOK The program contains a single instantiation of this Scanner: the SimpCompScanner class. This class exports the token values for all its multi-character tokens (single-character fixed tokens have as token values the value of the character).

Parsing is done using a separate thread from the user-interface thread. The required synchronization and interface with the rest of the program are defined by the abstract StepParser class. Each subclass of StepParser must implement a parse() method and a update() method. The parse() method should perform a parse according to the parse algorithm specified by the client. After each parse step, the concrete parse() method should call the waitToStep() method in it superclass. The waitToStep() method in StepParser waits until the GUI thread has read the results of the last step, before calling the update() method to update the results. The concrete update() method provided by each parse algorithm implementation should update the parser data structures. If a parse-error is detected, a ParseException is thrown; to stop a parser which is currently running, a ParseResetException is thrown. The exception-handling approach was chosen to handle these conditions, because it makes it easy to bale out of a recursive-descent parser.

This StepParser class is further refined into RecDescent, LL1Parser and SRParser classes: these classes implement the basic parsing algorithms. The parsers use a parse stack (ParseStk) which has entries of types derived from ParseStkEntry. At minimum, each parse-stack entry will contain a parse tree having entries of type ParseNode.

RecDescent is an abstract class in that it is independent of any particular grammar. SimpCompRecDescent contains a concrete recursive-descent parser derived from RecDescent. To allow the program to display the current line in this parser, it is generated from its source (.jpp file by using the C-preprocessor to expand macros which grab the line number (among other things).

The LL1Parser implements the basic LL(1) parsing algorithm by interpreting an abstract LL1Table. The example program contains a concrete LL1Table in the class SimpCompLL1Table. The parse-stack contains LL1ParseStkEntry.'s which are derived from ParseStkEntry, to additionally contain the current rule number and grammar symbol.

The SRParser implements the basic shift-reduce parsing algorithm by interpreting an abstract SRTable with entries of type SRAct. The example program contains a concrete SRTable in the class SimpCompSRTable. The parse-stack contains SRParseStkEntry.'s which are derived from ParseStkEntry, to additionally contain the state # of the corresponding SRState.

GUI Classes

Most of the widgets used in the GUI are standard awt widgets, but three of them are custom-canvases. These correspond to the parse trace display, the parse tree/forest display, and the parse table/program display. The last two displays also have the capability of having portions of selected, to highlight certain objects: this is achieved using the Selectable interface.

The canvas which all other custom canvases build on is implemented in the ScrollableCanvas class. Conceptually, this class contains a Canvas with two Scrollbars. The implementation uses a ScrollPanel derived from Panel. When any instance of ScrollableCanvas is to be inserted into a container, it is its ScrollPanel member (obtained using getComponent()) which must be inserted instead. This canvas also implements crude mouse hints by means of a image its subclasses are free to provide.

A TextCanvas is derived from ScrollableCanvas to contain lines of text. It implements the above SelectableCanvas interface to allow a single line to be selected. It uses auxiliary classes TextMark and TextLine.

A FileCanvas is a trivial subclass of TextClass, which initializes itself from a FileInputStream. It is used to display the program in the recursive descent parser.

A Table2Canvas is derived from ScrollableCanvas to contain a Table2. It implements the above SelectableCanvas interface to allow rows/columns to be selected. Scrolling this canvas keeps the table headings visible. It is used for displaying the parse tables for the LL(1) and SR parsers.

A TraceCanvas is derived from ScrollableCanvas and uses an auxiliary Trace class to display the parse trace for all three parsers. The stack string displayed as part of the trace is obtained by applying the toString() method to the ParseStk class: the toString() method prints out the stack entries appropriate to the class of ParseStk. The input string displayed as part of the trace is obtained by using the restInput() method on a scanner concatenated to the current lookahead.

Laying out of trees is achieved by the OffsetTree class which is derived from a Tree (of all the classes in the program, this is probably algorithmically the most complex). (I believe there is a newer paper which may be relevant. I haven't yet had a chance to look at it: S. Moen, "Drawing Dynamic Trees", IEEE Software, July 1990, pp. 21 - 28). The TreeCanvas class actually displays a forest of OffsetTree's, where the class OffsetForest is simply implemented as the subtrees of a with a invisible root.

Interface Classes

ParseDisplay contains within it references to the widgets used to display the state of the parse. It is created by the top-level and passed over to the parser. When each parse step is completed, the parser updates the ParseDisplay with its latest data structures. ExtFrame is an auxiliary class used to capture the external browser context: the parse display updates an instance of this class to show the current rule/state in an auxiliary brower frame.

Top-level Classes

ParsDemo is the main top-level class. It can be invoked either as an applet, as a full-frame applet or from the command-line. Depending on its invocation context, it obtains its single argument (the algorithm to run) from either its applet context or from the command line. It creates and lays out all the GUI widgets. It creates a scanner and a parser determined by the specified algorithm. It creates an enabled ExtFrame object iff it was created directly by a browser as an applet. Finally, it creates a parser thread and starts it. It interacts with the parser thread by calling the doStep() function which corresponds to the step button in the GUI. It implements the runnable interface to allow starting an auxiliary thread to auto-step the parser by a delay controlled by a scrollbar.

AppletFrame provides the context to run ParsDemo either from the command-line or as a sub-applet of another applet.

FullFrame is another applet which merely displays a start button. When the button is clicked, it starts up the ParsDemo applet in a full-frame.

Known Bugs and Deficiencies

When the parser is auto-step'ing (after the run button has been clicked), clicking the stop button does not stop the parser immediately, but only after an additional step. This detracts from the responsiveness of the user-interface.
If the parser is auto-step'ing (after the run button has been clicked), hitting return in the input text-area, restarts the parser in the auto-step mode. This is somewhat counter-intuitive.
The TreeCanvas draws the entire forest it is displaying, rather than only the portion which is visible. This may be problematic on slower machines.

(2) could be explained away as a feature!

Weaknesses

The event-handling code in ParsDemo has evolved, and has gotten extremely clumsy (witness bugs (1) and (2)). This could use a rewrite.
The algorithm used for OffsetTree class is complex enough, that bugs could lurk in there. I don't yet have a good framework for testing GUI programs, except manually.
This is my first Java program. Though the language seemed easy to pick up, I can't say the same about the libraries. Since I was learning the system as I was going along, portions of the code could use a cleanup.

To Do

A back button to step the parser backwards would be useful. The recursive descent parser is the most problematic for implementing this. Alternatives:
1. Keep track of the current # of steps. Restart the parser from scratch and step one fewer than the previous # of steps. This is pretty horrendous from an efficiency point of view, but just might work for the toy languages used in a demo and the way hardware is evolving.
2. Keep track of incremental information to undo a step. This is relatively easy for the table-driven parsers, but difficult for the recursive descent parser. To get the recursive descent parser program to run backwards, the best I can think of is to throw an exception to get out of the current function invocation, catching the exception just outside the current invocation. Then rerun that function forwards again. However, do not step into any recursive CALL's, but merely reuse information that was saved the first time.
3. Use incremental state undo information as in (b), but implement the recursive-descent parser using an interpreter, rather than a java program.

Using the Source Code

If you find the source code useful, feel free to reuse it in your projects under the terms of the GPL. If you do so, please drop me a note.

Browse the source directory

Up: Main Parsing Demo Page

Feedback: Please email any feedback to zdu@acm.org.