Compiler Requirements
Correctness.
Correctness
is absolutely paramount. A buggy compiler is next to useless in practice. Since
we cannot formally prove the correctness of your compilers, we use extensive
testing. This testing is end-to-end, verifying the correctness of the generated
code on sample inputs. We also verify that your compiler rejects programs as
expected when the input is not well-formed (lexically, syntactically, or with
respect to the static semantics), and that the generated code raises an
exception as expected if the language specification prescribes this. We go so
far as to test that your generated code fails to terminate (with a time-out)
when the source program should diverge.
Emphasis
on correctness means that we very carefully define the semantics of the source
language. The semantics of the target language is given by the GNU assembler on
the lab machines together with the semantics of the actually machine. Unlike C,
we try to make sure that as little as possible about the source language remains
undefined. This is not just for testability, but also good language design
practice since an unambiguously defined language is portable. The only part we
do not fully define are precise resource constraints regarding the generated
code (for example, the amount of memory available).
Efficiency.
In a production compiler, efficiency of the
generated code and also efficiency of the compiler itself are important
considerations. In this course, we set very lax targets for both, emphasizing
correctness instead. In one of the later labs in the course, you will have the
opportunity to optimize the generated code.
The
early emphasis on correctness has consequences for your approach to the design
of the implementation. Modularity and simplicity of the code are important for
two reasons: first, your code is much more likely to be correct, and, second,
you will be able to respond to changes in the source language specification
from lab to lab much more easily.
Interoperability.
Programs
do not run in isolation, but are linked with library code before they are
executed, or will be called as a library from other code. This puts some
additional requirements on the compiler, which must respect certain interface
specifications.
Your
generated code will be required to execute correctly in the environment on the
lab machines. This means that you will have to respect calling conventions
early on (for example, properly save callee-save registers) and data layout conventions
later, when your code will be calling library functions. You will have to
carefully study the ABI specification as it applies to C and our target
architecture.
Usability.
A
compiler interacts with the programmer primarily when there are errors in the
program. As such, it should give helpful error messages. Also, compilers may be
instructed to generate debug information together with executable code in order
help users debug runtime errors in their pro- gram.
In
this course, we will not formally evaluate the quality or detail of your error
messages, although you should strive to achieve at least a minimum standard so
that you can use your own compiler effectively.
Retargetability.
At
the outset, we think of a compiler of going from one source language to one
target language. In practice, compilers may be required to generate more than
one target from a given source (for example,x86-64 and ARM code), sometimes at
very different levels of abstraction (for example, x86-64 assembly or LLVM intermediate
code).
In
this course we will deemphasize retargetability, although if you structure your
compiler following the general outline presented in the next section, it should
not be too difficult to retrofit another code generator. One of the options for
the last lab in this course is to retarget your compiler to pro- duce code in a
low-level virtual machine (LLVM). Using LLVM tools this means you will be able
to produce efficient binaries for a variety of concrete machine architectures.