Compilers

The process of compilation can be split into four stages :

  1. Lexical Analysis
  2. Syntax Analysis
  3. Semantic Analysis
  4. Code Generation.

Lexical Analysis

  • Superfluous characters and spaces are removed.
  • All comments are removed.
  • All keywords, constants and identifiers used in the source code are replaced by 'tokens'.
  • Some simple error-checking is performed - for example
    • if the language imposes a limit on the length of identifiers or string constants, the lexical analyser will determine if the identifier is too long.
    • an illegal identifier would be flagged as an error.
    • the lexical analyser will detect an attempt to assign an illegal value to a constant, such as a value of the wrong type.
  • Error messages are output if appropriate

We will look at this process using the following example program:

program average;  
var n, c : integer;  
  x, y : real;  
begin    
  read(n);  
  x := 0;  
  for c:= 1 to n do  
  begin  
    read(y);
    if (y < 3.5) or (y > 49.25) then writeln(y, ' IS OUT OF RANGE')
    else x := x + y
  end;  
  writeln('AVERAGE = ',x/n)  
end.    

Each symbol and reserved word in the language has a code associated with it - we will use those shown in TABLE 1:
(In an exam question you may have to make one up for yourself)

TABLE 1: Keywords and Symbols
Symbol Code Symbol Code Symbol Code Symbol Code
+ 1 ) 9 begin 17 or 25
- 2 , 10 end 18 integer 26
* 3 ; 11 for 19 real 27
/ 4 : 12 to 20 writeln 28
= 5 . 13 do 21 sin 29
< 6 := 14 if 22 cos 30
> 7 program 15 then 23 sqrt 31
( 8 var 16 else 24 read 32

Three other tables are set up by the lexical analyser :

TABLE 2 : Numbers
Number Code
0 1
1 2
3.5 3
49.25 4
Table 3 : Strings
String Code
IS OUT OF RANGE 1
AVERAGE = 2
Table 4 : Labels
Label Code
average 1
n 2
c 3
x 4
y 5
 

Syntax Analysis

This is the process of determining whether the sequence of input characters, symbols, items or tokens obey the rules of the syntax of the language. These syntax rules would be expressed as either Syntax diagrams or Backus Naur Form.

Parsing is the task of systematically applying the rules to each statement in the program to determine whether it is valid.

The syntax analyser will report syntax errors if they occur.

 

A symbol table is constructed which tokenises the program.

Each component of the program is represented by a pair of numbers. The first gives the TABLE number, and the second number gives the code from that table.

The meaning of the original program has not been lost in the construction of the symbol table, - indeed, the original program could be reconstructed from this symbol table, and the four 'dictionary' tables.

(see Exercise 2)

The Symbol Table for the example program looks like:

Symbol Table : Example Program
Table Code Table Code Table Code Table Code
1 15            
4 1            
1 11            
1 16            
4 2            
1 10            
4 3            
1 12            
1 26            
4 4            
1 10            
               
               
               
               
               
               
               
               
               

 

Exercise 1

Complete the symbol table for the above example program.

(When finished, you should have 76 pairs of numbers)

 

Exercise 2

This is the symbol table for a program which uses Tables 1,2,3 and 4 of the example program above. 

Reconstruct the source program from which this symbol table was produced.

Symbol Table : Exercise 2
Table Code Table Code
1 15 1 11
4 4 4 3
1 11 1 14
1 16 2 4
4 2 1 11
1 10 1 28
4 3 1 8
1 12 4 2
1 26 1 1
1 11 4 3
1 17 1 9
4 2 1 18
1 14 1 13
2 3    

 

 

Semantic analysis is the process of checking the meaning of a program. The compiler decides how the program is to be executed. Eg - In what order to evaluate an arithmetic expression. 

It is at this stage that a 'dictionary' is generated. This is a list kept by the compiler of the variables used, their types, and the place in memory where their values can be found. All these details will be needed later when the program is run.

  • checks are made to see if all variables used in the program have been declared.
  • type checks are carried out - Eg. check that real values have not been assigned to integer variables.
  • checks that appropriate operations have been carried out on variables - Eg that integer division has not been carried out on real variables. 

 

 

Code Generation.

The final phase of the compilation process where the object code is generated. Most high level language statements will be translated into a number of machine code instructions.

Linking loaders would make sure that previously compiled code sections are loaded into the program and the required links created. For example, if a previously compiled procedure is included in a program, then an instruction which calls this procedure must know where to find it.

Code optimisation techniques attempt to reduce the execution time of the object program by, for example, spotting redundant instructions and producing object code which achieves the same net effect as that specified by the source program but not by the same means.

Run-time support routines would also need to be incorporated. These would perform such tasks as run-time error reporting, array element accessing, managing a run-time stack or storage of dynamic data structures.