Recognition of tokens:
We learn how to express pattern using regular
expressions. Now, we must study how to take the patterns for all the needed tokens
and build a piece of code that examins
the input string and finds a prefix that is a lexeme matching one of the
patterns.
Stmt àif
expr then stmt
|
If expr then else stmt
|
є
Expr àterm
relop term
| term
Term àid
|number
For relop ,we use the comparison operations of
languages like Pascal or SQL where = is “equals” and < > is “not equals” because it
presents an interesting structure of lexemes.
The terminal of grammar, which are if, then , else, relop
,id and numbers are the names of tokens
as far as the lexical analyzer is concerned, the patterns for the tokens are described using
regular definitions.
digit -->[0,9]
digits -->digit+
number -->digit(.digit)?(e.[+-]?digits)?
letter -->[A-Z,a-z]
id -->letter(letter/digit)*
if --> if
then -->then
else -->else
relop --></>/<=/>=/==/< >
In addition, we assign the lexical analyzer the job
stripping out white space, by recognizing the “token” we defined by:
ws
à(blank/tab/newline)+
Here, blank, tab and
newline are abstract symbols that we use to express the ASCII characters of the
same names. Token ws is different from the other tokens in that ,when we
recognize it, we do not return it to parser ,but rather restart the lexical
analysis from the character that follows the white space . It is the following
token that gets returned to the parser.
Lexeme
|
Token Name
|
Attribute Value
|
Any ws
|
_
|
_
|
if
|
if
|
_
|
then
|
then
|
_
|
else
|
else
|
_
|
Any id
|
id
|
pointer to table entry
|
Any number
|
number
|
pointer to table entry
|
<
|
relop
|
LT
|
<=
|
relop
|
LE
|
=
|
relop
|
ET
|
<
>
|
relop
|
NE
|
Transition
Diagram:
Transition
Diagram has a collection of nodes or circles, called states. Each state
represents a condition that could occur during the process of scanning the
input looking for a lexeme that matches
one of several patterns .
Edges are
directed from one state of the transition diagram to another. each edge is
labeled by a symbol or set of symbols.
If we are in one state s, and the next input symbol
is a, we look for an edge out of state s
labeled by a. if we find such an edge ,we advance the forward pointer and enter the state of the transition diagram to which that edge leads.
Some
important conventions about transition diagrams are
1. Certain
states are said to be accepting or final .These states indicates that a lexeme
has been found, although the actual lexeme may not consist of all
positions b/w the lexeme Begin and
forward pointers we always indicate an accepting state by a double circle.
2. In
addition, if it is necessary to return the forward pointer one position, then
we shall additionally place a * near that accepting state.
3. One state is designed the state ,or initial state ., it is indicated by
an edge labeled “start” entering from nowhere .the transition diagram always
begins in the state before any input symbols have been used.
NiceExplanation Thank U
ReplyDeletewhat is the meaning of other in transition diagram and ?
Deleteother refers to any character that is not indicated
Delete2. In addition, if it is necessary to return the forward pointer one position, then we shall additionally place a * near that accepting state.
ReplyDeleteWhat do you mean by " forward pointer one position" that why we are using * near the accepting state
good!!!
ReplyDeleteforward pointer position represents the arrow direction present state to next state
ReplyDeleteKnapsack Problem using Backtracking
ReplyDeleteIssues Knowledge Representation
ARM Architecture in Detail
File Models Distributed File System
Memory Allocation
Basic Primitive Operations Queue
First Come First Serve
What is CDMA Technology
I think token name should be their class name not repetition of lexeme. For example, "if", "then" etc should be keyword!
ReplyDelete