Last Update Time-stamp: "97/06/30 14:30:50 umrigar"
Browse the source directory
This note documents the overall design of the ParsDemo
program. It assumes that the reader is familiar with the operation of the
program.
Most of the classes which constitute the program are individually
trivial. Some of them have javadoc
style documentation, but I
did not use that style of documentation much as I found it too bureaucratic
and more suited for documenting the external interfaces of packages. This
note documents how all the classes fit together.
Initially, an attempt was made to separate the GUI aspects of the program from the parsing aspects of the program. Though this attempt was not entirely successful, the classes can be separated into 4 broad categories:
These are general container classes which may be useful outside this package.
Tree supports general n-ary trees. Each tree
node can hold an arbitrary information Object
. The
implementation of a tree-node uses two object references: a
kids
reference referencing the leftmost kid of a node and a
sibs
reference referencing the next sibling of the node. The
sibs
reference chain is linked through the parent to form a
circular list (with a flag used to establish when the reference goes up from
sibling to parent). These references make arbitrary tree navigation
possible, though care has to be taken when a subtree is added/removed from a
node. In particular, care should be taken that a subtree is never
simultaneously linked into more than one tree, else strange things
will happen. The class can probably use some debugging code to check for
such anomalous situations.
Currently, other classes are derived from Tree
. It is
probably better to merely treat Tree
as a general hierarchical
Object
container.
Table2 is an abstract class supporting
2-dimensional tables. A row/column index can be any arbitrary
Object
, as can the contents of each cell. The table is
accessed and updated using get()
and put()
methods
respectively. Since the table is expected to be sparse, the SparseTable2 implementation uses a hashtable
indexed by a auxiliary RowCol class. A blank
entry is represented using a null
Object
.
A grammar is represented using the abstract Grammar class. This class must be subclassed for a
concrete grammar. Grammar symbols (GramSym) and
rules are maintained by the Terminal, NonTerm and Rule classes.
Conceptually, these classes are inner classes (in the sense of Java 1.1) of
the Grammar
class. The Grammar
class exports
special grammar symbols like EOF
which every grammar should
have. It also has special protected interfaces for its subclasses to create
the grammar symbols and rules.
Rule RHS's are stored within an array within the Grammar
class. Hence an item (in the LR-item sense) is merely an index
into this array.
The example program contains two concrete grammars which are instantiations of the Grammar class: SimpCompSRGram and SimpCompLL1Gram (the latter is merely the former with left-recursion removed). Each concrete grammar exports its terminals and nonterminals to the rest of the package.
The general interface of a scanner is encapsulated via the abstract
class Scanner which is setup to deliver Token's. This Scanner
class defines
tokens which every scanner should have like EOF_TOK
The program
contains a single instantiation of this Scanner
: the SimpCompScanner class. This class exports
the token values for all its multi-character tokens (single-character fixed
tokens have as token values the value of the character).
Parsing is done using a separate thread from the user-interface thread.
The required synchronization and interface with the rest of the program are
defined by the abstract StepParser class.
Each subclass of StepParser
must implement a
parse()
method and a update()
method. The
parse()
method should perform a parse according to the parse
algorithm specified by the client. After each parse step, the concrete
parse()
method should call the waitToStep()
method
in it superclass. The waitToStep()
method in
StepParser
waits until the GUI thread has read the results of
the last step, before calling the update()
method to update the
results. The concrete update()
method provided by each parse
algorithm implementation should update the parser data structures. If a
parse-error is detected, a ParseException
is thrown; to stop a parser which is currently running, a ParseResetException is thrown. The
exception-handling approach was chosen to handle these conditions, because
it makes it easy to bale out of a recursive-descent parser.
This StepParser
class is further refined into RecDescent, LL1Parser and SRParser
classes: these classes implement the basic parsing algorithms. The parsers
use a parse stack (ParseStk) which has entries
of types derived from ParseStkEntry. At
minimum, each parse-stack entry will contain a parse tree having entries of
type ParseNode.
RecDescent is an abstract class in that it
is independent of any particular grammar. SimpCompRecDescent contains a concrete
recursive-descent parser derived from RecDescent
. To allow the
program to display the current line in this parser, it is generated from its
source (.jpp file by using the
C-preprocessor to expand macros which grab the line number (among other
things).
The LL1Parser implements the basic LL(1)
parsing algorithm by interpreting an abstract LL1Table. The example program contains a concrete
LL1Table
in the class SimpCompLL1Table. The parse-stack contains
LL1ParseStkEntry.'s which are derived
from ParseStkEntry
, to additionally contain the current rule
number and grammar symbol.
The SRParser implements the basic
shift-reduce parsing algorithm by interpreting an abstract SRTable with entries of type SRAct. The example program contains a concrete
SRTable
in the class SimpCompSRTable. The parse-stack contains
SRParseStkEntry.'s which are derived from
ParseStkEntry
, to additionally contain the state # of the
corresponding SRState.
Most of the widgets used in the GUI are standard awt
widgets, but three of them are custom-canvases. These correspond to the
parse trace display, the parse tree/forest display, and the parse
table/program display. The last two displays also have the capability of
having portions of selected, to highlight certain objects: this is achieved
using the Selectable interface.
The canvas which all other custom canvases build on is implemented in
the ScrollableCanvas class.
Conceptually, this class contains a Canvas
with two
Scrollbars
. The implementation uses a ScrollPanel derived from Panel
.
When any instance of ScrollableCanvas
is to be inserted into a
container, it is its ScrollPanel
member (obtained using
getComponent()
) which must be inserted instead. This canvas
also implements crude mouse hints by means of a image its subclasses are
free to provide.
A TextCanvas is derived from
ScrollableCanvas
to contain lines of text. It implements the
above SelectableCanvas
interface to allow a single line to be
selected. It uses auxiliary classes TextMark
and TextLine.
A FileCanvas is a trivial subclass of
TextClass
, which initializes itself from a
FileInputStream
. It is used to display the program in the
recursive descent parser.
A Table2Canvas is derived from
ScrollableCanvas
to contain a Table2.
It implements the above SelectableCanvas
interface to allow
rows/columns to be selected. Scrolling this canvas keeps the table headings
visible. It is used for displaying the parse tables for the LL(1) and SR
parsers.
A TraceCanvas is derived from
ScrollableCanvas
and uses an auxiliary Trace class to display the parse trace for all three
parsers. The stack string displayed as part of the trace is obtained by
applying the toString()
method to the ParseStk class: the toString()
method
prints out the stack entries appropriate to the class of
ParseStk
. The input string displayed as part of the trace is
obtained by using the restInput()
method on a scanner
concatenated to the current lookahead.
Laying out of trees is achieved by the OffsetTree class which is derived from a Tree (of all the classes in the program, this is
probably algorithmically the most complex). (I believe there is a newer
paper which may be relevant. I haven't yet had a chance to look at it:
S. Moen, "Drawing Dynamic Trees", IEEE Software, July 1990, pp. 21 - 28).
The TreeCanvas class actually displays a
forest of ParseDisplay contains within it
references to the widgets used to display the state of the parse. It is
created by the top-level and passed over to the parser. When each parse
step is completed, the parser updates the ParsDemo is the main top-level class. It
can be invoked either as an applet, as a full-frame applet or from the
command-line. Depending on its invocation context, it obtains its single
argument (the algorithm to run) from either its applet context or from the
command line. It creates and lays out all the GUI widgets. It creates a
scanner and a parser determined by the specified algorithm. It creates an
enabled AppletFrame provides the context to run
FullFrame is another applet which merely
displays a Browse the source directory
Feedback: Please email any feedback to
zdu@acm.org.
OffsetTree
's, where the class OffsetForest is simply implemented as the
subtrees of a Interface Classes
ParseDisplay
with its
latest data structures. ExtFrame is an
auxiliary class used to capture the external browser context: the parse
display updates an instance of this class to show the current rule/state in
an auxiliary brower frame.
Top-level Classes
ExtFrame
object iff it was created directly by a
browser as an applet. Finally, it creates a parser thread and starts it.
It interacts with the parser thread by calling the doStep()
function which corresponds to the step
button in the GUI. It
implements the runnable
interface to allow starting an
auxiliary thread to auto-step the parser by a delay controlled by a
scrollbar.
ParsDemo
either from the command-line or as a sub-applet of
another applet.
start
button. When the button is clicked, it starts
up the ParsDemo
applet in a full-frame.
Known Bugs and Deficiencies
(2) could be explained away as a feature!
run
button has
been clicked), clicking the stop
button does not stop the
parser immediately, but only after an additional step. This detracts from
the responsiveness of the user-interface.
run
button has
been clicked), hitting return in the input text-area, restarts the parser in
the auto-step mode. This is somewhat counter-intuitive.
TreeCanvas
draws the entire forest it is displaying,
rather than only the portion which is visible. This may be problematic on
slower machines.
Weaknesses
To Do
Using the Source Code
If you find the source code useful, feel free to reuse it in your projects
under the terms of the GPL. If you do so, please
drop me a note.
Up: Main Parsing Demo Page