Monday, April 14, 2014

preprocessor explanation with example

There are three basic phases that are important in programming e.g. C- programming. The phases are Pre-processing, compiling and Linking, only two phases are required: compiler and linking.

prog.càPREPROCESSORà
temp.càCOMPILERà
prog.objàLINKERàprog.exe

A preprocessor basically reads line by line from the input and replicates the line to the output. If the line is a preprocessing statement, then it performs the preprocessing directives. All preprocessing statements in C begin with the # symbol and are at the beginning of a line. The operations of taking the input file and generating the output file, pre-processed file, is called pre-processing.

PRE-PROCESSOR          à    COMPILER          à      LINKER
(reads program line by line)      (identify the uncoded       (search for libraries files)
                                            functions like printf,scanf)     

Example

#include<stdio.h> // pre-processor
#define Max 10  // pre-processor
int main(void)
{
int i;
for(i=0;i<Max;i++)
printf(“Hello World!”);
return 0;
}

From above example #include and #define defines header files. Pre-processor reads line by line. Its starts from main() reading line by line, generates the output file as i.e. temp.c is pre-processor output file.
Compiler takes pre-processor output file as input for compiler and generated object file i.e. prog.obj. This object file contains machine code generated from the program you wrote in your original C file. It does not as of yet contain code for functions such as printf. Since the code for printf is not in your code, the object file contains symbol printf.   Usually a special file extension(obj) is given to an object file.
Think of an object file as a partially complete program with missing blocks of code (ie the code for printf).  During the next phase called the Link Phase, it is the linker program that is responsible to find the missing code "printf" and link it(bind it) to the object file generated by the compiler. The compiler also removes all comments from the input file.
The linker is a process that accepts as input object files and libraries to produce the final executable program. Libraries contain object code. This object code in turn contains functions. In the link phase the object code from the compile phase is bound to the missing function code such as printf. The function printf is located in the standard input and output libraries. The final executable now contains all the necessary code for execution.



Wednesday, March 26, 2014

Why learn about compilers?

Why learn about compilers?

Few people will ever be required to write a compiler for a general-purpose language like C, Pascal or SML. So why do most computer science institutions offer compiler courses and often make these mandatory Some typical reasons are:

  • It is considered a topic that you should know in order to be “well-cultured” in computer science.
  • A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists.
  • The techniques used for constructing a compiler are useful for other purposes as well.
  • There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language.

Compiler Requirements

Compiler Requirements

Correctness.
Correctness is absolutely paramount. A buggy compiler is next to useless in practice. Since we cannot formally prove the correctness of your compilers, we use extensive testing. This testing is end-to-end, verifying the correctness of the generated code on sample inputs. We also verify that your compiler rejects programs as expected when the input is not well-formed (lexically, syntactically, or with respect to the static semantics), and that the generated code raises an exception as expected if the language specification prescribes this. We go so far as to test that your generated code fails to terminate (with a time-out) when the source program should diverge.
Emphasis on correctness means that we very carefully define the semantics of the source language. The semantics of the target language is given by the GNU assembler on the lab machines together with the semantics of the actually machine. Unlike C, we try to make sure that as little as possible about the source language remains undefined. This is not just for testability, but also good language design practice since an unambiguously defined language is portable. The only part we do not fully define are precise resource constraints regarding the generated code (for example, the amount of memory available).
Efficiency.
 In a production compiler, efficiency of the generated code and also efficiency of the compiler itself are important considerations. In this course, we set very lax targets for both, emphasizing correctness instead. In one of the later labs in the course, you will have the opportunity to optimize the generated code.
The early emphasis on correctness has consequences for your approach to the design of the implementation. Modularity and simplicity of the code are important for two reasons: first, your code is much more likely to be correct, and, second, you will be able to respond to changes in the source language specification from lab to lab much more easily.
Interoperability.
Programs do not run in isolation, but are linked with library code before they are executed, or will be called as a library from other code. This puts some additional requirements on the compiler, which must respect certain interface specifications.
Your generated code will be required to execute correctly in the environment on the lab machines. This means that you will have to respect calling conventions early on (for example, properly save callee-save registers) and data layout conventions later, when your code will be calling library functions. You will have to carefully study the ABI specification as it applies to C and our target architecture.
Usability.
A compiler interacts with the programmer primarily when there are errors in the program. As such, it should give helpful error messages. Also, compilers may be instructed to generate debug information together with executable code in order help users debug runtime errors in their pro- gram.
In this course, we will not formally evaluate the quality or detail of your error messages, although you should strive to achieve at least a minimum standard so that you can use your own compiler effectively.
Retargetability.
At the outset, we think of a compiler of going from one source language to one target language. In practice, compilers may be required to generate more than one target from a given source (for example,x86-64 and ARM code), sometimes at very different levels of abstraction (for example, x86-64 assembly or LLVM intermediate code).

In this course we will deemphasize retargetability, although if you structure your compiler following the general outline presented in the next section, it should not be too difficult to retrofit another code generator. One of the options for the last lab in this course is to retarget your compiler to pro- duce code in a low-level virtual machine (LLVM). Using LLVM tools this means you will be able to produce efficient binaries for a variety of concrete machine architectures.

Definations

Definitions:

Translator

A device that changes a sentence from one language to another without change of meaning.

 Compiler

A program that translates between programming languages.

 Interpreter

A processor that compiles and executes programming language statements one by one in an interleaved manner.

 Syntax

An alphabet and a set of rules defining spatial relationships between symbols and symbol sets in a language.

 Semantics

The meanings assigned to symbols and symbol sets in a language.

 Pragmatics

The meanings perceived to be associated with symbols and symbol sets in a language.