Recognition of tokens:
We learn how to express pattern using regular
expressions. Now, we must study how to take the patterns for all the needed tokens
and build a piece of code that examins
the input string and finds a prefix that is a lexeme matching one of the
patterns.
Stmt à if
expr then stmt
|
If expr then else stmt
|
Ñ”
Expr à term
relop term
| term
Term à id
|number
For relop ,we use the comparison operations of
languages like Pascal or SQL where = is “equals” and < > is “not equals” because it
presents an interesting structure of lexemes.
The terminal of grammar, which are if, then , else, relop
,id and numbers are the names of tokens
as far as the lexical analyzer is concerned, the patterns for the tokens are described using
regular definitions.
digit -->[0,9]
digits -->digit+
number -->digit(.digit)?(e.[+-]?digits)?
letter -->[A-Z,a-z]
id -->letter(letter/digit)*
if --> if
then -->then
else -->else
relop --></>/<=/>=/==/< >
In addition, we assign the lexical analyzer the job
stripping out white space, by recognizing the “token” we defined by:
ws
à (blank/tab/newline)+
Here, blank, tab and
newline are abstract symbols that we use to express the ASCII characters of the
same names. Token ws is different from the other tokens in that ,when we
recognize it, we do not return it to parser ,but rather restart the lexical
analysis from the character that follows the white space . It is the following
token that gets returned to the parser.
Lexeme
|
Token Name
|
Attribute Value
|
Any ws
|
_
|
_
|
if
|
if
|
_
|
then
|
then
|
_
|
else
|
else
|
_
|
Any id
|
id
|
pointer to table entry
|
Any number
|
number
|
pointer to table entry
|
<
|
relop
|
LT
|
<=
|
relop
|
LE
|
=
|
relop
|
ET
|
<
>
|
relop
|
NE
|
Transition
Diagram:
Transition
Diagram has a collection of nodes or circles, called states. Each state
represents a condition that could occur during the process of scanning the
input looking for a lexeme that matches
one of several patterns .
Edges are
directed from one state of the transition diagram to another. each edge is
labeled by a symbol or set of symbols.
If we are in one state s, and the next input symbol
is a, we look for an edge out of state s
labeled by a. if we find such an edge ,we advance the forward pointer and enter the state of the transition diagram to which that edge leads.
Some
important conventions about transition diagrams are
1. Certain
states are said to be accepting or final .These states indicates that a lexeme
has been found, although the actual lexeme may not consist of all
positions b/w the lexeme Begin and
forward pointers we always indicate an accepting state by a double circle.
2. In
addition, if it is necessary to return the forward pointer one position, then
we shall additionally place a * near that accepting state.
3. One state is designed the state ,or initial state ., it is indicated by
an edge labeled “start” entering from nowhere .the transition diagram always
begins in the state before any input symbols have been used.