Wednesday, June 27, 2012

Transition diagram for recognition of tokens

Recognition of tokens:                                                               
We learn how to express pattern using regular expressions. Now, we must study how to take the patterns for all the needed tokens and build a piece of code that examins  the input string and finds a prefix that is a lexeme matching one of the patterns.                                                                                                                             

 Stmt àif expr then  stmt                                                                                                        
             |  If expr then else stmt
             | є
Expr  àterm relop term
              | term
Term àid
For relop ,we use the comparison operations of languages like Pascal or SQL where = is “equals”  and < > is “not equals” because it presents  an interesting structure of  lexemes.
The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of  tokens as far as the lexical analyzer is concerned, the  patterns for the tokens are described using regular definitions.

digit        -->[0,9]                                                                                                        
digits      -->digit+                                                                                                                      
number   -->digit(.digit)?(e.[+-]?digits)?                                                                           
letter       -->[A-Z,a-z]                                                                                                     
 id           -->letter(letter/digit)*                                                                                                   
 if            --> if                                                                                                                          
 then        -->then                                                                                                               
 else        -->else                                                                                                          
relop       --></>/<=/>=/==/< > 

In addition, we assign the lexical analyzer the job stripping out white space, by recognizing the “token” we defined by:
            ws à(blank/tab/newline)+
Here, blank, tab and newline are abstract symbols that we use to express the ASCII characters of the same names. Token ws is different from the other tokens in that ,when we recognize it, we do not return it to parser ,but rather restart the lexical analysis from the character that follows the white space . It is the following token that gets returned to the parser. 

Token Name 
      Attribute Value      
         Any ws
         Any id  
   pointer to table entry
      Any number 
   pointer to table entry               
          < >    

Transition Diagram:
            Transition Diagram has a collection of nodes or circles, called states. Each state represents a condition that could occur during the process of scanning the input  looking for a lexeme that matches one  of several patterns .
   Edges are directed from one state of the transition diagram to another. each edge is labeled by a symbol or set of symbols.
If we are in one state s, and the next input symbol is a, we look for  an edge out of state s labeled by a. if we find such an edge ,we advance the forward          pointer and enter the state of the  transition diagram to which that edge leads.
Some important conventions about transition diagrams are
      1. Certain states are said to be accepting or final .These states indicates that a lexeme has been found, although the actual lexeme may not consist of all positions  b/w the lexeme Begin and forward pointers we always indicate an accepting state by a double circle.
      2.   In addition, if it is necessary to return the forward pointer one position, then we shall additionally place a * near that accepting state.
          3. One  state is designed the  state ,or initial state ., it is indicated by an edge labeled “start” entering from nowhere .the transition diagram always begins in the state before any input symbols have been used. 



  1. 2. In addition, if it is necessary to return the forward pointer one position, then we shall additionally place a * near that accepting state.

    What do you mean by " forward pointer one position" that why we are using * near the accepting state

  2. forward pointer position represents the arrow direction present state to next state