Monday, April 14, 2014

preprocessor explanation with example

There are three basic phases that are important in programming e.g. C- programming. The phases are Pre-processing, compiling and Linking, only two phases are required: compiler and linking.

prog.càPREPROCESSORà
temp.càCOMPILERà
prog.objàLINKERàprog.exe

A preprocessor basically reads line by line from the input and replicates the line to the output. If the line is a preprocessing statement, then it performs the preprocessing directives. All preprocessing statements in C begin with the # symbol and are at the beginning of a line. The operations of taking the input file and generating the output file, pre-processed file, is called pre-processing.

PRE-PROCESSOR          à    COMPILER          à      LINKER
(reads program line by line)      (identify the uncoded       (search for libraries files)
                                            functions like printf,scanf)     

Example

#include<stdio.h> // pre-processor
#define Max 10  // pre-processor
int main(void)
{
int i;
for(i=0;i<Max;i++)
printf(“Hello World!”);
return 0;
}

From above example #include and #define defines header files. Pre-processor reads line by line. Its starts from main() reading line by line, generates the output file as i.e. temp.c is pre-processor output file.
Compiler takes pre-processor output file as input for compiler and generated object file i.e. prog.obj. This object file contains machine code generated from the program you wrote in your original C file. It does not as of yet contain code for functions such as printf. Since the code for printf is not in your code, the object file contains symbol printf.   Usually a special file extension(obj) is given to an object file.
Think of an object file as a partially complete program with missing blocks of code (ie the code for printf).  During the next phase called the Link Phase, it is the linker program that is responsible to find the missing code "printf" and link it(bind it) to the object file generated by the compiler. The compiler also removes all comments from the input file.
The linker is a process that accepts as input object files and libraries to produce the final executable program. Libraries contain object code. This object code in turn contains functions. In the link phase the object code from the compile phase is bound to the missing function code such as printf. The function printf is located in the standard input and output libraries. The final executable now contains all the necessary code for execution.



Wednesday, March 26, 2014

Why learn about compilers?

Why learn about compilers?

Few people will ever be required to write a compiler for a general-purpose language like C, Pascal or SML. So why do most computer science institutions offer compiler courses and often make these mandatory Some typical reasons are:

  • It is considered a topic that you should know in order to be “well-cultured” in computer science.
  • A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists.
  • The techniques used for constructing a compiler are useful for other purposes as well.
  • There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language.

Compiler Requirements

Compiler Requirements

Correctness.
Correctness is absolutely paramount. A buggy compiler is next to useless in practice. Since we cannot formally prove the correctness of your compilers, we use extensive testing. This testing is end-to-end, verifying the correctness of the generated code on sample inputs. We also verify that your compiler rejects programs as expected when the input is not well-formed (lexically, syntactically, or with respect to the static semantics), and that the generated code raises an exception as expected if the language specification prescribes this. We go so far as to test that your generated code fails to terminate (with a time-out) when the source program should diverge.
Emphasis on correctness means that we very carefully define the semantics of the source language. The semantics of the target language is given by the GNU assembler on the lab machines together with the semantics of the actually machine. Unlike C, we try to make sure that as little as possible about the source language remains undefined. This is not just for testability, but also good language design practice since an unambiguously defined language is portable. The only part we do not fully define are precise resource constraints regarding the generated code (for example, the amount of memory available).
Efficiency.
 In a production compiler, efficiency of the generated code and also efficiency of the compiler itself are important considerations. In this course, we set very lax targets for both, emphasizing correctness instead. In one of the later labs in the course, you will have the opportunity to optimize the generated code.
The early emphasis on correctness has consequences for your approach to the design of the implementation. Modularity and simplicity of the code are important for two reasons: first, your code is much more likely to be correct, and, second, you will be able to respond to changes in the source language specification from lab to lab much more easily.
Interoperability.
Programs do not run in isolation, but are linked with library code before they are executed, or will be called as a library from other code. This puts some additional requirements on the compiler, which must respect certain interface specifications.
Your generated code will be required to execute correctly in the environment on the lab machines. This means that you will have to respect calling conventions early on (for example, properly save callee-save registers) and data layout conventions later, when your code will be calling library functions. You will have to carefully study the ABI specification as it applies to C and our target architecture.
Usability.
A compiler interacts with the programmer primarily when there are errors in the program. As such, it should give helpful error messages. Also, compilers may be instructed to generate debug information together with executable code in order help users debug runtime errors in their pro- gram.
In this course, we will not formally evaluate the quality or detail of your error messages, although you should strive to achieve at least a minimum standard so that you can use your own compiler effectively.
Retargetability.
At the outset, we think of a compiler of going from one source language to one target language. In practice, compilers may be required to generate more than one target from a given source (for example,x86-64 and ARM code), sometimes at very different levels of abstraction (for example, x86-64 assembly or LLVM intermediate code).

In this course we will deemphasize retargetability, although if you structure your compiler following the general outline presented in the next section, it should not be too difficult to retrofit another code generator. One of the options for the last lab in this course is to retarget your compiler to pro- duce code in a low-level virtual machine (LLVM). Using LLVM tools this means you will be able to produce efficient binaries for a variety of concrete machine architectures.

Definations

Definitions:

Translator

A device that changes a sentence from one language to another without change of meaning.

 Compiler

A program that translates between programming languages.

 Interpreter

A processor that compiles and executes programming language statements one by one in an interleaved manner.

 Syntax

An alphabet and a set of rules defining spatial relationships between symbols and symbol sets in a language.

 Semantics

The meanings assigned to symbols and symbol sets in a language.

 Pragmatics

The meanings perceived to be associated with symbols and symbol sets in a language.

Friday, October 4, 2013

FIRST AND FOLLOWS

FIRST: 
If α is any string of grammar symbols, the FIRST (α) be the set of terminals that begins strings derived from α.
                 If αà€, then € is also in FIRST (α).
To compute FIRST (α) for all grammar symbols ’X’, apply the following rules until no more terminals or € can be added to any FIRST set.
     1)    If x is terminal ,the FIRST(x)={x}
     2)    If x is non terminal and xà aα is a production, then add a to FIRST(x), If xà€ is a production, then add € to FIRST(x).
     3)    If xày1,y2,y3,…….yk is a production ,then for all i such that of all y1……yi-1 are non terminals and FIRST(yi) contains € for j=1,2,3….i-1 (i.e.,yi,y2,…..yi-1à€) add every non-€ symbol in FIRST(yi) to FIRST(x).

FOLLOW:    For a non terminal A, FOLLOW (A) to de the set of terminal a, that can appear immediately to the right of A in some sentential form i.e., sàαAaβ for some α and β.
      To compute FOLLOW (A) for all non-terminals A, apply the following rules until nothing can be added to any follow set.
1)      $ is in FOLLOW(S), where S is the start symbol.
2)      If there is a production Aàαbβ, then everything in FIRST (β) except for € is in FOLLOW (B).

Example :

Construct the predictive parser for the following grammar
        Sà(L)/a
        LàL,S/S

Sol: We have to eliminate left recursion LàL,S/S.
After eliminating the left recursion we obtain
           Sà(L)/a
           LàSL1
           L1à, SL1/€

Then FIRST
FIRST (S) = { ( , a} by rule (2)
FIRST (L) = { ( ,a} by rule (3) and (1)
FIRST (L1) = {, ,€} by rule (2)

FOLLOW
FOLLOW(S) = { $ , ) , ,}    =FIRST(L1) U FOLLOW(L)
FOLLOW (L) = { )  ,}
FOLLOW (L1) = { ) }

The parsing table

   (
   )
      a
   ,
    $
S
Sà(L)

Sàa


L
LàSL1

LàSL1


L1

L1 à

L1 à,SL1
L1  à

Classification of parsing Techniques

There are two parsing techniques, these parsing techniques work on the following principle
1. The parser scans the input string from left to right and identifiers that the derivation is leftmost or rightmost.
2. The parser makes use of production rules for choosing the appropriate derivation. The different parsing techniques use different approaches in selecting the appropriate rules for derivation and finally a parse tree is constructed.
When the parse tree can be constructed from root and expanded to leaves, then such type of parse is called top-down parser. The name itself tells us that the parse tree can be built from top to bottom.

When the parse tree can be constructed from leaves to root, then such type of parse is called as bottom–up parse. Thus the parse tree is built in bottom up manner.


The Role of a Parser

 In this process of compilation the parser and lexical analyzer work together. That means, when parser required string of tokens it invokes lexical analyzer .In turn, the lexical analyzer supplies tokens to syntax analyzer (parser).

 The parser collects sufficient number of tokens and builds a parse tree. Then by building the parse tree, parse smartly finds the syntactical errors if any. It is also necessary that the parse should recover from commonly occurring errors so that remaining task of process the input can be continued.

Syntax Analysis

 The parser or syntax analysis checks for whether the string given can be generated from the given grammar.
 A parser for grammar G is a program that takes as input a string w and produces as output either a parser tree for w, if w is a sentences of G, or an end message indicating that w is not a sentence of G. 

Thursday, October 3, 2013

Aho Alfred V. Pearson Education India, 1999 - 796 pages

A.A.Puntambekar Technical Publications, 01-Jan-2009 - 540 pages

compiler design by A.A.Putambekar

A.A.Puntambekar Technical Publications, 01-Jan-2010 - Compilers (Computer programs) - 461 pages Overview of Compilation : Phases of compilation - Lexical analysis, Regular grammar and regular expression for common programming language features, Pass and phases of translation, Interpretation, Bootstrapping, Data structures in compilation - LEX lexical analyzer generator.Top Down Parsing : Context free grammars, Top down parsing, Backtracking, LL (1), Recursive descent parsing, Predictive parsing, Preprocessing steps required for predictive parsing.Bottom up Parsing : Shift reduce parsing, LR and LALR parsing, Error recovery in parsing, Handling ambiguous grammar, YACC - automatic parser generator.Semantic Analysis : Intermediate forms of source programs - abstract syntax tree, Polish notation and three address codes. Attributed grammars, Syntax directed translation, Conversion of popular programming languages language constructs into intermediate code forms, Type checker.Symbol Tables : Symbol table format, Organization for block structures languages, Hashing, Tree structures representation of scope information. Block structures and non block structure storage allocation : Static, Runtime stack and heap storage allocation, Storage allocation for arrays, strings and records.Code Optimization : Consideration for optimization, Scope of optimization, Local optimization, Loop optimization, Frequency reduction, Folding, DAG representation.Data Flow Analysis : Flow graph, Data flow equation, Global optimization, Redundant subexpression elimination, Induction variable elements, Live variable analysis, Copy propagation.Object Code Generation : Object code forms, Machine dependent code optimization, Register allocation and assignment generic code generation algorithms, DAG for register allocation.

Wednesday, July 11, 2012

LINUX and UNIX Vi Editor Commonds


LINUX AND UNIX COMMANDS

Syntax:
vi [ -| -s ] [-l] [-L] [-R] [ -r [ filename ] ] [-S] [-t tag] [-v] [-V] [-x] [-w] [-n ] [-C] [+command | -c command ] filename
- | -s
Suppress all interactive user feedback. This is useful when processing editor scripts.
-l
Set up for editing LISP programs.
-L
List the name of all files saved as the result of an editor or system crash.
-R
Readonly mode; the readonly flag is set, preventing accidental overwriting of the file.
-r filename
Edit filename after an editor or system crash. (Recovers the version of filename that was in the buffer when the crash occurred.)
-S
This option is used in conjunction with the -t tag option to tell vi that the tags file may not be sorted and that, if the binary search (which relies on a sorted tags file) for tag fails to
find it, the much slower linear search should also be done. Since the linear search is slow, users of large tags files should ensure that the tags files are sorted rather than use this flag. Creation of tags files normally produces sorted tags files. See ctags for more information on tags files.
-t tag
Edit the file containing the tag, tag, and position the editor at its definition.
-v
Start up in display editing state using vi . You can achieve the same effect by typing the vi command itself.
-V
Verbose. When ex commands are read by means of standard input, the input will be echoed to standard error. This may be useful when processing ex commands within shell scripts.
-x
Encryption option; when used, vi simulates the X command of ex and prompts the user for a key. This key is used to encrypt and decrypt text using the algorithm of the crypt command. The X command makes an educated guess to determine whether text read in is encrypted or not. The temporary buffer file is encrypted also, using a transformed version of the key typed in for the -x option. If an empty encryption key is entered (that is, if the return key is pressed right after the prompt), the file will not be encrypted. This is a good way to decrypt a file erroneously encrypted with a mistyped encryption key, such as a backspace or undo key.
-wn
Set the default window size to n. This is useful when using the editor over a slow speed line.
-C
Encryption option; same as the -x option, except that vi simulates the C command of ex . The C command is like the X command of ex , except that all text read in is assumed to have been encrypted.
+command | -c command
Begin editing by executing the specified editor
command (usually a search or positioning command).
filename
The file to be edited.
User command
Arrow keys
Move cursor
hjkl
Same as arrow keys
itextESC
Insert text
cwnewESC
Change word to new
easESC
pluralize word (end of word; append s; escape from input state)
x
delete a character
dw
delete a word
dd
delete a line
3dd
deletes 3 lines
u
undo previous change
ZZ
exit vi , saving changes
:q!CR
quit, discarding changes
/textCR
search for text
^U ^D
scroll up or down
:cmdCR
any ex or ed command
ESC
end insert or incomplete command
DEL
(delete or rubout) interrupts
:wCR
write back changes
:w!CR
forced write, if permission originally not valid
:qCR
quit
:q!CR
quit, discard changes
:e nameCR
edit file name
:e!CR
reedit, discard changes
:e + nameCR
edit, starting at end
:e +nCR
edit, starting at line n
:e #CR
edit alternate file
:e! #CR
edit alternate file, discard changes
:w nameCR
write file name
:w! nameCR
overwrite file name
:shCR
run shell, then return
:!cmdCR
run cmd, then return
:nCR
edit next file in arglist
:n argsCR
specify new arglist
^G
show current file and line
:ta tagCR
position cursor to tag
F
forward screen
^B
backward screen
^D
scroll down half screen
^U
scroll up half screen
nG
go to the beginning of the specified line (end default), where n is a line number
/pat
next line matching pat
?pat
previous line matching pat
n
repeat last / or ? command
N
reverse last / or ? command
/pat/+n
nth line after pat
?pat?-n
nth line before pat
]]
next section/function
[[
previous section/function
(
beginning of sentence
)
end of sentence
{
beginning of paragraph
}
end of paragraph
%
find matching ( ) or { }
^L
clear and redraw window
^R
clear and redraw window if ^L is -> key
zCR
redraw screen with current line at top of window
z-CR
redraw screen with current line at bottom of window
z.CR
redraw screen with current line at center of window
/pat/z-CR
move pat line to bottom of window
zn.CR
use n-line window
^E
scroll window down one line
^Y
scroll window up one line
``
move cursor to previous context
''
move cursor to first non-white space in line
mx
mark current position with the ASCII lower-case letter x
`x
move cursor to mark x
'x
move cursor to first non-white space in line marked by x
H
top line on screen
L
last line on screen
M
middle line on screen
+
next line, at first non-white space character
-
previous line, at first non-white space character
CR
return, same as +
down-arrow or j
next line, same column
up-arrow or k
previous line, same column
^
first non-white space character
0
beginning of line
$
end of line
l or ->
forward
h or <-
backward
^H
same as <- (backspace)
space
same as -> (space bar)
fx
find next x
Fx
find next x
tx
move to character following the next x
Tx
move to character following the previous x
;
repeat last f, F, t, or T
,
repeat inverse of last f, F, t, or T
n|
move to column n
%
find matching ( ) or { }
w
forward a word
b
back a word
e
end of word
)
to next sentence
}
to next paragraph
(
back a sentence
{
back a paragraph
W
forward a blank-delimited word
B
back a blank-delimited word
E
end of a blank-delimited word
^H
erase last character (backspace)
^W
erase last word
erase
your erase character, same as ^H (backspace)
kill
your kill character, erase this line of input
\
quotes your erase and kill characters
ESC
ends insertion, back to command mode
CTRL-C
interrupt, suspends insert mode
^D
backtab one character; reset left margin of autoindent
^^D
caret (^) followed by control-d (^D); backtab to beginning of line; do not reset left margin of autoindent
0^D
backtab to beginning of line; reset left margin of autoindent
^V
quote non-printable character
a
append after cursor
A
append at end of line
i
insert before cursor
I
insert before first non-blank
o
open line below
O
open line above
rx
replace single character with x
RtextESC
replace characters
d
delete
c
change
y
yank lines to buffer
left shift
right shift
!
filter through command
C
change rest of line (c$)
D
delete rest of line (d$)
s
substitute characters (cl)
S
substitute lines (cc)
J
join lines
x
delete characters (dl)
X
delete characters before cursor dh)
Y
yank lines (yy)
3yy
yank 3 lines
3yl
yank 3 characters
p
put back text after cursor
P
put back text before cursor " .nr )I xp"n
put from buffer x " .nr )I xy"n
yank to buffer x " .nr )I xd"n
delete into buffer x
u
undo last change
U
restore current line
.
repeat last change " .nr )I dp"n
retrieve d'th last delete