In Python, all type checking is done via a general type assertion mechanism. Explicit declarations and implicit assertions (e.g. the arg to + is a number) are recorded in the front-end (implicit continuation) representation. Type assertions (and thus type-checking) are “unbundled” from the operations that are affected by the assertion. This has two major advantages:
See also restrict.
The back end is the part of the compiler that operates on the virtual machine intermediate representation. Also included are the compiler phases involved in the conversion from the front end representation (or ICR).
This is a node type the that marks the start of a lambda body in ICR. This serves as a placeholder for environment manipulation code.
The first intermediate representation, also known as ICR, or the Implicit Continuation Represenation.
The second intermediate representation, also known as VMR, or the Virtual Machine Representation.
A basic block (or simply “block”) has the pretty much the
usual meaning of representing a straight-line sequence of code. However, the
code sequence ultimately generated for a block might contain internal branches
that were hidden inside the implementation of a particular operation. The type
of a block is actually cblock
. The block-info
slot holds an
VMR-block
containing backend information.
Block compilation is a term commonly used to describe the compile-time resolution of function names. This enables many optimizations.
Each node in the call graph is a function (represented by a flow graph.) The arcs in the call graph represent a possible call from one function to another. See also tail set.
A cleanup is the part of the implicit continuation representation that retains information scoping relationships. For indefinite extent bindings (variables and functions), we can abandon scoping information after ICR conversion, recovering the lifetime information using flow analysis. But dynamic bindings (special values, catch, unwind protect, etc.) must be removed at a precise time (whenever the scope is exited.) Cleanup structures form a hierarchy that represents the static nesting of dynamic binding structures. When the compiler does a control transfer, it can use the cleanup information to determine what cleanup code needs to be emitted.
A closure variable is any lexical variable that has references outside of its home environment. See also indirect value cell.
A closed continuation represents a tagbody
tag
or block
name that is closed over. These two cases are mostly
indistinguishable in ICR.
Home is a term used to describe various back-pointers. A lambda variable’s “home” is the lambda that the variable belongs to. A lambda’s “home environment” is the environment in which that lambda’s variables are allocated.
Any closure variable that has assignments (setq
s) will be allocated in an
indirect value cell. This is necessary to ensure that all references to
the variable will see assigned values, since the compiler normally freely
copies values when creating a closure.
Any variable that is assigned to is called a “set variable”. Several optimizations must special-case set variables, and set closure variables must have an indirect value cell.
The code generator for a VOP is a potentially arbitrary list code fragment which is responsible for emitting assembly code to implement that VOP.
The part of a compiled code object that holds pointers to non-immediate constants.
A constant TN is the VMR of a compile-time constant value. A constant may be immediate, or may be allocated in the constant pool.
A constant leaf is the ICR of a compile-time constant value.
A combination node is the ICR of any fixed-argument function
call (not apply
or multiple-value-call
.)
A top-level component is any component whose only entry points are top-level lambdas.
A top-level lambda represents the execution of the outermost form on which
the compiler was invoked. In the case of compile-file
, this is often a
truly top-level form in the source file, but the compiler can recursively
descend into some forms (eval-when
, etc.) breaking them into separate
compilations.
A component is basically a sequence of blocks. Each component is compiled into a separate code object. With block compilation or local functions, a component will contain the code for more than one function. This is called a component because it represents a connected portion of the call graph. Normally the blocks are in depth-first order (DFO).
During ICR conversion, blocks are temporarily assigned to initial components. The “flow graph canonicalization” phase determines the true component structure.
The head and tail of a component are dummy blocks that mark the start and end of the DFO sequence. The component head and tail double as the root and finish node of the component’s flow graph.
A local function call is a call to a function known at compile time to be in the same component. Local call allows compile time resolution of the target address and calling conventions. See block compilation.
Register allocation terminology. Two TNs conflict if they could ever be live simultaneously. The conflict set of a TN is all TNs that it conflicts with.
The ICR data structure which represents both:
In the Implicit Continuation Representation, the environment is implicit in the continuation’s BLOCK (hence the name.) The ICR continuation is very similar to a CPS continuation in its use, but its representation doesn’t much resemble (is not interchangeable with) a lambda.
A slot in the node holding the continuation which receives the node’s value(s). Unless the node ends a block, this also implicitly indicates which node should be evaluated next.
Approximations of the run-time costs of operations are widely used in the back end. By convention, the unit is generally machine cycles, but the values are only used for comparison between alternatives. For example, the VOP cost is used to determine the preferred order in which to try possible implementations.
See control stack pointer and control frame pointer.
The main call stack, which holds function stack frames. All words on the control stack are tagged descriptors. In all ports done so far, the control stack grows from low memory to high memory. The most recent call frames are considered to be “on top” of earlier call frames.
The allocation pointer for the control stack. Generally this points to the first free word at the top of the stack.
The pointer to the base of the control stack frame for a particular function invocation. The CFP for the running function must be in a register.
The auxiliary stack used to hold any non-descriptor (untagged) objects. This is generally the same as the C call stack, and thus typically grows down.
The allocation pointer for the number stack. This is typically the C stack pointer, and is thus kept in a register.
See number stack pointer, number frame pointer.
The pointer to the base of the number stack frame for a particular function invocation. Functions that don’t use the number stack won’t have an NFP, but if an NFP is allocated, it is always allocated in a particular register. If there is no variable-size data on the number stack, then the NFP will generally be identical to the NSP.
The name of the descriptor encoding the “return pc” for a function call.
See lisp return address. Also, the name of the register where the LRA is passed.
A pointer to the header of a code object. The code pointer
for the currently running function is stored in the code
register.
A pointer into the inside of some heap-allocated object. Interior pointers confuse the garbage collector, so their use is highly constrained. Typically there is a single register dedicated to holding interior pointers.
A slot in the continuation which points the the node that receives this value. Null if this value is not received by anyone.
See Depth First Number, Depth First Order.
Blocks are numbered according to their appearance in
the depth-first ordering (the block-number
slot.) The numbering actually
increases from the component tail, so earlier blocks have larger numbers.
This is a linearization of the flow graph, obtained by a depth-first walk. Iterative flow analysis algorithms work better when blocks are processed in DFO (or reverse DFO.)
In low-level design discussions, an object is one of the following:
These are tagged with three low-tag bits as described in the section sec-tagging This is synonymous with descriptor. In other parts of the documentation, may be used more loosely to refer to a lisp object.
A Lisp object is a high-level object discussed as a data type in the Common Lisp definition.
A data-block is a dual-word aligned block of memory that either manifests a Lisp object (vectors, code, symbols, etc.) or helps manage a Lisp object on the heap (array header, function header, etc.).
A descriptor is a tagged, single-word object. It either contains immediate data or a pointer to data. This is synonymous with object. Storage locations that must contain descriptors are referred to as descriptor locations.
A descriptor that points to a data block in memory (i.e. not an immediate object.)
A descriptor that encodes the object value in the descriptor itself; used for characters, fixnums, etc.
A word is a 32-bit quantity.
Any chunk of bits that isn’t a valid tagged descriptor. For example, a double-float on the number stack. Storage locations that are not scanned by the garbage collector (and thus cannot contain pointer descriptors) are called non-descriptor locations. Immediate descriptors can be stored in non-descriptor locations.
An entry point is a function that may be subject to “unpredictable” control transfers. All entry points are linked to the root of the flow graph (the component head.) The only functions that aren’t entry points are let functions. When complex lambda-list syntax is used, multiple entry points may be created for a single lisp-level function. See external entry point.
A function that serves as a “trampoline” to intercept function calls coming in from outside of the component. The XEP does argument syntax and type checking, and may also translate the arguments and return values for a locally specialized calling calling convention.
An external entry point.
A lexical environment is a structure that is used
during VMR conversion to represent all lexically scoped bindings (variables,
functions, declarations, etc.) Each node
is annotated with its lexical
environment, primarily for use by the debugger and other user interfaces. This
structure is also the environment object passed to macroexpand
.
The environment is part of the ICR, created during
environment analysis. Environment analysis apportions code to disjoint
environments, with all code in the same environment sharing the same stack
frame. Each environment has a “real” function that allocates it, and
some collection let
functions. Although environment analysis is the
last ICR phase, in earlier phases, code is sometimes said to be “in the
same/different environment(s)”. This means that the code will definitely be
in the same environment (because it is in the same real function), or that is
might not be in the same environment, because it is not in the same function.
Some sort of back-patching annotation. The main sort encountered are load-time assembler fixups, which are a linkage annotation mechanism.
A flow graph is a directed graph of basic blocks, where each arc represents a possible control transfer. The flow graph is the basic data structure used to represent code, and provides direct support for data flow analysis. See component and ICR.
An attribute of known functions. A function is foldable if calls may be constant folded whenever the arguments are compile-time constant. Generally this means that it is a pure function with no side effects.
function “real” (allocates environment) meaning function-entry more vague (any lambda?) funny function GEN (kill and...) global TN, conflicts, preference GTN (number) IR ICR VMR ICR conversion, VMR conversion (translation) inline expansion, call kill (to make dead) known function LAMBDA leaf let call lifetime analysis, live (tn, variable) load tn LOCS (passing, return locations) local call local TN, conflicts, (or just used in one block) location (selection) LTN (number) main entry mess-up (for cleanup) more arg (entry) MV non-local exit non-packed SC, TN non-set variable operand (to vop) optimizer (in icr optimize) optional-dispatch pack, packing, packed pass (in a transform) passing locations (value) conventions (known, unknown) policy (safe, fast, small, ...) predecessor block primitive-type reaching definition REF representation selection for value result continuation (for function) result type assertion (for template) (or is it restriction) restrict a TN to finite SBs a template operand to a primitive type (boxed...) a tn-ref to particular SCs
return (node, vops) safe, safety saving (of registers, costs) SB SC (restriction) semi-inline side-effect in ICR in VMR sparse set splitting (of VMR blocks) SSET SUBPRIMITIVE successor block tail recursion tail recursive tail recursive loop user tail recursion
template TN TNBIND TN-REF transform (source, ICR) type assertion inference top-down, bottom-up assertion propagation derived, asserted descriptor, specifier, intersection, union, member type check type-check (in continuation) UNBOXED (boxed) descriptor unknown values continuation unset variable unwind-block, unwinding used value (dest) value passing VAR VM VOP