June 2012 ~ COMPILER DESIGN

Recognition of reserved words and identifiers

Posted by Knowledegehub on 11:40 PM with 1 comment

Recognizing keywords and identifiers presents a problem. Usually, keywords like if or then are reserved, so they are not identifiers even though they look identifiers. Then although we typically use a transition diagram like that. fig to search for identifier lexemes this diagram will also recognize the keyword if then else of an running example.

return (get Token (), install des)

There are two ways that we can handle reserved words that look like identifiers.

1. Install the reserved word,in the symbol table initially .A field of the symbol-table entry indicates that these strings are never ordinary identifiers ,and tells which token they represent .when we find an identifier a call to install ID places it in the symbol table if it is not already there and returns a pointer t the symbol-table entry for the lexeme found. Of course, any identifier not in the symbol table during lexical analysis cannot be a reserved word, so its token is id.

2. Create separate transition diagram for each keyword: an example for the keyword then is shown fig .Note that such a transition diagram consist of states representing the situation after each successive letter of the keyword is seen .followed by a text for a “non letter-or-digit” .i.e., any character that cannot be the continuation of an identifier.

Transition diagram for recognition of tokens

Posted by Knowledegehub on 11:35 PM with 8 comments

Recognition of tokens:

We learn how to express pattern using regular expressions. Now, we must study how to take the patterns for all the needed tokens and build a piece of code that examins the input string and finds a prefix that is a lexeme matching one of the patterns.

Stmt àif expr then stmt

| If expr then else stmt

| є

Expr àterm relop term

| term

Term àid

|number

For relop ,we use the comparison operations of languages like Pascal or SQL where = is “equals” and < > is “not equals” because it presents an interesting structure of lexemes.

The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of tokens as far as the lexical analyzer is concerned, the patterns for the tokens are described using regular definitions.

digit -->[0,9]

digits -->digit+

number -->digit(.digit)?(e.[+-]?digits)?

letter -->[A-Z,a-z]

id -->letter(letter/digit)*

if --> if

then -->then

else -->else

relop --></>/<=/>=/==/< >

In addition, we assign the lexical analyzer the job stripping out white space, by recognizing the “token” we defined by:

ws à(blank/tab/newline)⁺

Here, blank, tab and newline are abstract symbols that we use to express the ASCII characters of the same names. Token ws is different from the other tokens in that ,when we recognize it, we do not return it to parser ,but rather restart the lexical analysis from the character that follows the white space . It is the following token that gets returned to the parser.

Lexeme	Token Name	Attribute Value
Any ws	_	_
if	if	_
then	then	_
else	else	_
Any id	id	pointer to table entry
Any number	number	pointer to table entry
<	relop	LT
<=	relop	LE
=	relop	ET
< >	relop	NE

Transition Diagram:

Transition Diagram has a collection of nodes or circles, called states. Each state represents a condition that could occur during the process of scanning the input looking for a lexeme that matches one of several patterns .

Edges are directed from one state of the transition diagram to another. each edge is labeled by a symbol or set of symbols.

If we are in one state s, and the next input symbol is a, we look for an edge out of state s labeled by a. if we find such an edge ,we advance the forward pointer and enter the state of the transition diagram to which that edge leads.

Some important conventions about transition diagrams are

1. Certain states are said to be accepting or final .These states indicates that a lexeme has been found, although the actual lexeme may not consist of all positions b/w the lexeme Begin and forward pointers we always indicate an accepting state by a double circle.

2. In addition, if it is necessary to return the forward pointer one position, then we shall additionally place a * near that accepting state.

3. One state is designed the state ,or initial state ., it is indicated by an edge labeled “start” entering from nowhere .the transition diagram always begins in the state before any input symbols have been used.

Regular Defination

Posted by Knowledegehub on 11:27 PM with No comments

We may wish to give to regular expression and to define regular expression using these names as if they were symbols.

If ∑ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form,

d₁--> r₁

d₂ --> ₂

…….

d_n --> r_n

Where each di is a distinct name, and each ri is a regular expression over the symbols in ∑ u{d1,d2,….dn}, ie. the basic symbols and the previously defined names. By restricting each ri to symbols of ∑ and the previously defined names, we can construct a regular expression over ∑ for any r_i by repeatedly replacing regular expression names by the expression they denote. If r_i used dj for some j>=i, then r_i might be recursively defined .and this substitution process would not terminate.

To distinguish names from symbols, we print the names in regular definitions in bold face.

COMPILER DESIGN

Wednesday, June 27, 2012

Recognition of reserved words and identifiers

Transition diagram for recognition of tokens

Regular Defination

Followers

Popular Posts

Blogger news

About

Categories

Blog Archive

Labels

Contact Form