2.5 Pointer-Type Objects and Spaces

Each of the pointer-type lisp objects points into a different space in virtual memory. There are separate spaces for Bit-Vectors, Symbols, Lists, and so on. The 5-bit type-code provides the high-order virtual address bits for the object, followed by the 2-bit space code, followed by the 25-bit pointer address. This gives a 30-bit virtual address to a 32-bit word; since the IBM RT PC is a byte-addressed machine, the two low-order bits are 0. In effect we have carved a 30-bit space into a fixed set of 23-bit subspaces, not all of which are used.

The space code divides each of the type spaces into four sub-spaces, as shown in the table above. At any given time, one of the dynamic spaces is considered newspace, while the other is oldspace. During a stop and copy garbage collection, a “flip” can be done, turning the old newspace into the new oldspace. All type-spaces are flipped at once. Allocation of new dynamic objects always occurs in newspace.

Optionally, the user (or system functions) may allocate objects in static or read-only space. Such objects are never reclaimed once they are allocated – they occupy the space in which they were initially allocated for the lifetime of the Lisp process. The advantage of static allocation is that the GC never has to move these objects, thereby saving a significant amount of work, especially if the objects are large. Objects in read-only space are static, in that they are never moved or reclaimed; in addition, they cannot be altered once they are set up. Pointers in read-only space may only point to read-only or static space, never to dynamic space. This saves even more work, since read-only space does not need to be scavenged, and pages of read-only material do not need to be written back onto the disk during paging.

Objects in a particular type-space will contain either pointers to garbage-collectible objects or words of raw non-garbage-collectible bits, but not both. Similarly, a space will contain either fixed-length objects or variable-length objects, but not both. A variable-length object always contains a 24-bit length field right-justified in the first word, with the positive fixnum type-code in the high-order five bits. The remaining three bits can be used for sub-type information. The length field gives the size of the object in 32-bit words, including the header word. The garbage collector needs this information when the object is moved, and it is also useful for bounds checking.

The format of objects in each space are as follows:

Symbol

Each symbol is represented as a fixed-length block of boxed Lisp cells. The number of cells per symbol is 5, in the following order:

0  Value cell for shallow binding.
1  Definition cell: a function or list.
2  Property list: a list of attribute-value pairs.
3  Print name: a string.
4  Package: the obarray holding this symbol.
List

A fixed-length block of two boxed Lisp cells, the CAR and the CDR.

General-Vector

Vector of lisp objects, any length. The first word is a fixnum giving the number of words allocated for the vector (up to 24 bits). The highest legal index is this number minus 2. The second word is vector entry 0, and additional entries are allocated contiguously in virtual memory. General vectors are sometimes called G-Vectors. (See Vectors for further details.)

Integer-Vector

Vector of integers, any length. The 24 low bits of the first word give the allocated length in 32-bit words. The low-order 28 bits of the second word gives the length of the vector in entries, whatever the length of the individual entries may be. The high-order 4 bits of the second word contain access-type information that yields, among other things, the number of bits per entry. Entry 0 is left-justified in the third word of the vector. Bits per entry will normally be powers of 2, so they will fit neatly into 32-bit words, but if necessary some empty space may be left at the low-order end of each word. Integer vectors are sometimes called I-Vectors. (See Vectors for details.)

Bit-Vector

Vector of bits, any length. Bit-Vectors are represented in a form identical to I-Vectors, but live in a different space for efficiency reasons.

Bignum

Bignums are infinite-precision integers, represented in a format identical to G-Vectors. Each bignum is stored as a series of 32-bit words, with the low-order word stored first. The representation is two’s complement, but the sign of the number is redundantly encoded in the type field of the fixnum in the header word. If this fixnum is non-negative, then so is the bignum, if it is negative, so is the bignum.

Floats

Floats are stored as two or more consecutive words of bits, in the following format:

---------------------------------------------------------------
|  Header word, used only for GC forward pointers.	      |
---------------------------------------------------------------
|  Appropriate number of 32-bit words in machine format	      |
---------------------------------------------------------------

The number of words used to represent a floating point number is one plus the size of the floating point number being stored. The floating point numbers will be represented in whatever format the IBM RT PC expects. The extra header word is needed so that a valid floating point number is not mistaken for a gc-forward pointer during a garbage collection.

Ratio

Ratios are stored as two consecutive words of Lisp objects, which should both be integers.

Complex

Complex numbers are stored as two consecutive words of Lisp objects, which should both be numbers.

Array

This is actually a header which holds the accessing and other information about the array. The actual array contents are held in a vector (either an I-Vector or G-Vector) pointed to by an entry in the header. The header is identical in format to a G-Vector. For details on what the array header contains, see Arrays.

String

A vector of bytes. Identical in form to I-Vectors with the access type always 8-Bit. However, instead of accepting and returning fixnums, string accesses accept and return character objects. Only the 8-bit code field is actually stored, and the returned character object always has bit and font values of 0.

Function

A compiled CMU Common Lisp function consists of both lisp objects and raw bits for the code. The Lisp objects are stored in the Function space in a format identical to that used for general vectors, with a 24-bit length field in the first word. This object contains assorted parameters needed by the calling machinery, a pointer to an 8-bit I-Vector containing the compiled code, a number of pointers to symbols used as special variables within the function, and a number of lisp objects used as constants by the function.