Compiler Organization


3 Compiler Overview

The structure of the compiler may be broadly characterized by describing the compilation phases and the data structures that they manipulate. The steps in the compilation are called phases rather than passes since they don’t necessarily involve a full pass over the code. The data structure used to represent the code at some point is called an intermediate representation.

Two major intermediate representations are used in the compiler:

Each phase is briefly described here. The phases from “local call analysis” to “constraint propagation” all interact; for maximum optimization, they are generally repeated until nothing new is discovered. The source files which primarily contain each phase are listed after “Files: ”.

ICR conversion

Convert the source into ICR, doing macroexpansion and simple source-to-source transformation. All names are resolved at this time, so we don’t have to worry about name conflicts later on. Files: ir1tran, srctran, typetran

Local call analysis

Find calls to local functions and convert them to local calls to the correct entry point, doing keyword parsing, etc. Recognize once-called functions as lets. Create external entry points for entry-point functions. Files: locall

Find components

Find flow graph components and compute depth-first ordering. Separate top-level code from run-time code, and determine which components are top-level components. Files: dfo

ICR optimize

A grab-bag of all the non-flow ICR optimizations. Fold constant functions, propagate types and eliminate code that computes unused values. Special-case calls to some known global functions by replacing them with a computed function. Merge blocks and eliminate IF-IFs. Substitute let variables. Files: ir1opt, ir1tran, typetran, seqtran, vm/vm-tran

Type constraint propagation

Use global flow analysis to propagate information about lexical variable types. Eliminate unnecessary type checks and tests. Files: constraint

Type check generation

Emit explicit ICR code for any necessary type checks that are too complex to be easily generated on the fly by the back end. Files: checkgen

Event driven operations

Various parts of ICR are incrementally recomputed, either eagerly on modification of the ICR, or lazily, when the relevant information is needed.

  • Check that type assertions are satisfied, marking places where type checks need to be done.
  • Locate let calls.
  • Delete functions and variables with no references

Files: ir1util, ir1opt

ICR finalize

This phase is run after all components have been compiled. It scans the global variable references, looking for references to undefined variables and incompatible function redefinitions. Files: ir1final, main.

Environment analysis

Determine which distinct environments need to be allocated, and what context needed to be closed over by each environment. We detect non-local exits and set closure variables. We also emit cleanup code as funny function calls. This is the last pure ICR pass. Files: envanal

Global TN allocation (GTN)

Iterate over all defined functions, determining calling conventions and assigning TNs to local variables. Files: gtn

Local TN allocation (LTN)

Use type and policy information to determine which VMR translation to use for known functions, and then create TNs for expression evaluation temporaries. We also accumulate some random information needed by VMR conversion. Files: ltn

Control analysis

Linearize the flow graph in a way that minimizes the number of branches. The block-level structure of the flow graph is basically frozen at this point. Files: control

Stack analysis

Maintain stack discipline for unknown-values continuation in the presence of local exits. Files: stack

Entry analysis

Collect some back-end information for each externally callable function.

VMR conversion Convert ICR into VMR by translating nodes into VOPs.

Emit type checks. Files: ir2tran, vmdef

Copy propagation Use flow analysis to eliminate unnecessary copying of

TN values. Files: copyprop

Representation selection

Look at all references to each TN to determine which representation has the lowest cost. Emit appropriate move and coerce VOPS for that representation.

Lifetime analysis

Do flow analysis to find the set of TNs whose lifetimes overlap with the lifetimes of each TN being packed. Annotate call VOPs with the TNs that need to be saved. Files: life

Pack

Find a legal register allocation, attempting to minimize unnecessary moves. Files: pack

Code generation

Call the VOP generators to emit assembly code. Files: codegen

Pipeline reorganization On some machines, move memory references

backward in the code so that they can overlap with computation. On machines with delayed branch instructions, locate instructions that can be moved into delay slots. Files: assem-opt

Assembly

Resolve branches and convert into object code and fixup information. Files: assembler

Dumping

Convert the compiled code into an object file or in-core function. Files: debug-dump, dump, vm/core