Pointers to data-blocks have the following format:
---------------------------------------------------------------- | Dual-word address of data-block (29 bits) | 1 1 1 | ----------------------------------------------------------------
The word pointed to by the above descriptor is a header-word, and it has the same format as an other-immediate:
---------------------------------------------------------------- | Data (24 bits) | Type (8 bits with low-tag) | 0 1 0 | ----------------------------------------------------------------
This is convenient for scanning the heap when GC’ing, but it does mean that whenever GC encounters an other-immediate word, it has to do a range check on the low byte to see if it is a header-word or just a character (for example). This is easily acceptable performance hit for scanning.
The system interprets the data portion of the header-word for non-vector data-blocks as the word length excluding the header-word. For example, the data field of the header for ratio and complex numbers is two, one word each for the numerator and denominator or for the real and imaginary parts.
For vectors and data-blocks representing Lisp objects stored like vectors, the system (usually) ignores the data portion of the header-word:
---------------------------------------------------------------- | Unused Data (24 bits) | Type (8 bits with low-tag) | 0 1 0 | ---------------------------------------------------------------- | Element Length of Vector (30 bits) | 0 0 | ----------------------------------------------------------------
Using a separate word allows for much larger vectors, and it allows length
to simply access a single word without masking or shifting. Similarly,
the header for complex arrays and vectors has a second word, following the
header-word, the system uses for the fill pointer, so computing the length of
any array is the same code sequence.
For normal Lisp vectors, the data portion MUST be zero. For hash tables, a vector is used to store information about the hash key and value, and the data portion is non-zero to indicate to GC that this is the key/value vector for the hash table. GENCGC uses this to determine scavenge the key/value pairs correctly. Cheney GC also uses this to determine if rehashing (for EQ hash tables) is needed.