Code Data-Blocks (Design of CMU Common Lisp)

40.12 Code Data-Blocks

A code data-block is the run-time representation of a “component”. A component is a connected portion of a program’s flow graph that is compiled as a single unit, and it contains code for many functions. Some of these functions are callable from outside of the component, and these are termed “entry points”.

Each entry point has an associated user-visible function data-block (of type function). The full call convention provides for calling an entry point specified by a function object.

Although all of the function data-blocks for a component’s entry points appear to the user as distinct objects, the system keeps all of the code in a single code data-block. The user-visible function object is actually a pointer into the middle of a code data-block. This allows any control transfer within a component to be done using a relative branch.

Besides a function object, there are other kinds of references into the middle of a code data-block. Control transfer into a function also occurs at the return-PC for a call. The system represents a return-PC somewhat similarly to a function, so GC can also recognize a return-PC as a reference to a code data-block. This representation is known as a Lisp Return Address (LRA).

It is incorrect to think of a code data-block as a concatenation of “function data-blocks”. Code for a function is not emitted in any particular order with respect to that function’s function-header (if any). The code following a function-header may only be a branch to some other location where the function’s “real” definition is.

The following are the three kinds of pointers to code data-blocks:

Code pointer (labeled A below):

A code pointer is a descriptor, with other-pointer low-tag bits, pointing to the beginning of the code data-block. The code pointer for the currently running function is always kept in a register (CODE). In addition to allowing loading of non-immediate constants, this also serves to represent the currently running function to the debugger.

LRA (labeled B below):

The LRA is a descriptor, with other-pointer low-tag bits, pointing to a location for a function call. Note that this location contains no descriptors other than the one word of immediate data, so GC can treat LRA locations the same as instructions.

Function (labeled C below):

A function is a descriptor, with function low-tag bits, that is user callable. When a function header is referenced from a closure or from the function header’s self-pointer, the pointer has other-pointer low-tag bits, instead of function low-tag bits. This ensures that the internal function data-block associated with a closure appears to be uncallable (although users should never see such an object anyway).

Information about functions that is only useful for entry points is kept in some descriptors following the function’s self-pointer descriptor. All of these together with the function’s header-word are known as the “function header”. GC must be able to locate the function header. We provide for this by chaining together the function headers in a NIL terminated list kept in a known slot in the code data-block.

A code data-block has the following format:

A -->
****************************************************************
|  Header-Word count (24 bits)    |   Code-Type (8 bits)       |
----------------------------------------------------------------
|  Number of code words (fixnum tag)                           |
----------------------------------------------------------------
|  Pointer to first function header (other-pointer tag)        |
----------------------------------------------------------------
|  Debug information (structure tag)                           |
----------------------------------------------------------------
|  First constant (a descriptor)                               |
----------------------------------------------------------------
|  ...                                                         |
----------------------------------------------------------------
|  Last constant (and last word of code header)                |
----------------------------------------------------------------
|  Some instructions (non-descriptor)                          |
----------------------------------------------------------------
|     (pad to dual-word boundary if necessary)                 |

B -->
****************************************************************
|  Word offset from code header (24)   |   Return-PC-Type (8)  |
----------------------------------------------------------------
|  First instruction after return                              |
----------------------------------------------------------------
|  ... more code and LRA header-words                          |
----------------------------------------------------------------
|     (pad to dual-word boundary if necessary)                 |

C -->
****************************************************************
|  Offset from code header (24)  |   Function-Header-Type (8)  |
----------------------------------------------------------------
|  x86/amd64/sparc: Address of start of instructions for       |
|  function (non-descriptor)                                   |
|  other architectures:                                        |
|  Self-pointer back to previous word (with other-pointer tag) |
----------------------------------------------------------------
|  Pointer to next function (other-pointer low-tag) or NIL     |
----------------------------------------------------------------
|  Function name (a string or a symbol)                        |
----------------------------------------------------------------
|  Function debug arglist (a string)                           |
----------------------------------------------------------------
|  Function type (a list-style function type specifier)        |
----------------------------------------------------------------
|  Start of instructions for function (non-descriptor)         |
----------------------------------------------------------------
|  More function headers and instructions and return PCs,      |
|  until we reach the total size of header-words + code        |
|  words.                                                      |
----------------------------------------------------------------

The following are detailed slot descriptions:

Code data-block header-word:

The immediate data in the code data-block’s header-word is the number of leading descriptors in the code data-block, the fixed overhead words plus the number of constants. The first non-descriptor word, some code, appears at this word offset from the header.

Number of code words:

The total number of non-header-words in the code data-block. The total word size of the code data-block is the sum of this slot and the immediate header-word data of the previous slot. header-word.

Pointer to first function header:

A NIL-terminated list of the function headers for all entry points to this component.

Debug information:

The DEBUG-INFO structure describing this component. All information that the debugger wants to get from a running function is kept in this structure. Since there are many functions, the current PC is used to locate the appropriate debug information. The system keeps the debug information separate from the function data-block, since the currently running function may not be an entry point. There is no way to recover the function object for the currently running function, since this data-block may not exist.

First constant ... last constant:

These are the constants referenced by the component, if there are any.

LRA header word:

The immediate header-word data is the word offset from the enclosing code data-block’s header-word to this word. This allows GC and the debugger to easily recover the code data-block from an LRA. The code at the return point restores the current code pointer using a subtract immediate of the offset, which is known at compile time.

Function entry point header-word:

The immediate header-word data is the word offset from the enclosing code data-block’s header-word to this word. This is the same as for the return-PC header-word.

Address of start of instructions for function:

This is implemented on x86, amd64, and sparc only. In a non-closure function, this address allows the call sequence to always indirect through the second word in a user callable function. See section “Closure Format”. With a closure, indirecting through the second word also gets you the start of instructions of a function. This pointer is a raw address, not a descriptor.

Self-pointer back to header-word:

In a non-closure function, this self-pointer to the previous header-word allows the call sequence to always indirect through the second word in a user callable function. See section “Closure Format”. With a closure, indirecting through the second word gets you a function header-word. The system ignores this slot in the function header for a closure, since it has already indirected once, and this slot could be some random thing that causes an error if you jump to it. This pointer has an other-pointer tag instead of a function pointer tag, indicating it is not a user callable Lisp object.

Pointer to next function:

This is the next link in the thread of entry point functions found in this component. This value is NIL when the current header is the last entry point in the component.

Function name:

This function’s name (for printing). If the user defined this function with DEFUN, then this is the defined symbol, otherwise it is a descriptive string.

Function debug arglist:

A printed string representing the function’s argument list, for human readability. If it is a macroexpansion function, then this is the original DEFMACRO arglist, not the actual expander function arglist.

Function type:

A list-style function type specifier representing the argument signature and return types for this function. For example,

(function (fixnum fixnum fixnum) fixnum)

(function (string &key (:start unsigned-byte)) string)

This information is intended for machine readablilty, such as by the compiler.