Dumping (Design of CMU Common Lisp)

Next: User Interface of the Compiler, Previous: Assembly, Up: Design of CMU Common Lisp [Contents]

24 Dumping ¶

So far as input to the dumper/loader, how about having a list of Entry-Info structures in the VMR-Component? These structures contain all information needed to dump the associated function objects, and are only implicitly associated with the functional/XEP data structures. Load-time constants that reference these function objects should specify the Entry-Info, rather than the functional (or something). We would then need to maintain some sort of association so VMR conversion can find the appropriate Entry-Info. Alternatively, we could initially reference the functional, and then later clobber the reference to the Entry-Info.

We have some kind of post-pass that runs after assembly, going through the functions and constants, annotating the VMR-Component for the benefit of the dumper: Resolve :Label load-time constants. Make the debug info. Make the entry-info structures.

Fasl dumper and in-core loader are implementation (but not instruction set) dependent, so we want to give them a clear interface.

open-fasl-file name => fasl-file
    Returns a ``fasl-file'' object representing all state needed by the dumper.
    We objectify the state, since the fasdumper should be reentrant.  (but
    could fail to be at first.)

close-fasl-file fasl-file abort-p
    Close the specified fasl-file.

fasl-dump-component component code-vector length fixups fasl-file
    Dump the code, constants, etc. for component.  Code-Vector is a vector
    holding the assembled code.  Length is the number of elements of Vector
    that are actually in use.  Fixups is a list of conses (offset . fixup)
    describing the locations and things that need to be fixed up at load time.
    If the component is a top-level component, then the top-level lambda will
    be called after the component is loaded.

load-component component code-vector length fixups
    Like Fasl-Dump-Component, but directly installs the code in core, running
    any top-level code immediately.  (???) but we need some way to glue
    together the componenents, since we don't have a fasl table.

Dumping:

Dump code for each component after compiling that component, but defer dumping of other stuff. We do the fixups on the code vectors, and accumulate them in the table.

We have to grovel the constants for each component after compiling that component so that we can fix up load-time constants. Load-time constants are values needed by the code that are computed after code generation/assembly time. Since the code is fixed at this point, load-time constants are always represented as non-immediate constants in the constant pool. A load-time constant is distinguished by being a cons (Kind . What), instead of a Constant leaf. Kind is a keyword indicating how the constant is computed, and What is some context.

Some interesting load-time constants:

    (:label . <label>)
        Is replaced with the byte offset of the label within the code-vector.

    (:code-vector . <component>)
        Is replaced by the component's code-vector.

    (:entry . <function>)
    (:closure-entry . <function>)
	Is replaced by the function-entry structure for the specified function.
	:Entry is how the top-level component gets a handle on the function
	definitions so that it can set them up.

We also need to remember the starting offset for each entry, although these don’t in general appear as explicit constants.

We then dump out all the :Entry and :Closure-Entry objects, leaving any constant-pool pointers uninitialized. After dumping each :Entry, we dump some stuff to let genesis know that this is a function definition. Then we dump all the constant pools, fixing up any constant-pool pointers in the already-dumped function entry structures.

The debug-info *is* a constant: the first constant in every constant pool. But the creation of this constant must be deferred until after the component is compiled, so we leave a (:debug-info) placeholder. [Or maybe this is implicitly added in by the dumper, being supplied in a VMR-component slot.]

Work out details of the interface between the back-end and the assembler/dumper.

Support for multiple assemblers concurrently loaded? (for byte code)

We need various mechanisms for getting information out of the assembler.

We can get entry PCs and similar things into function objects by making a Constant leaf, specifying that it goes in the closure, and then setting the value after assembly.

We have an operation Label-Value which can be used to get the value of a label after assembly and before the assembler data structures are deallocated.

The function map can be constructed without any special help from the assembler. Codegen just has to note the current label when the function changes from one block to the next, and then use the final value of these labels to make the function map.

Probably we want to do the source map this way too. Although this will make zillions of spurious labels, we would have to effectively do that anyway.

With both the function map and the source map, getting the locations right for uses of Elsewhere will be a bit tricky. Users of Elsewhere will need to know about how these maps are being built, since they must record the labels and corresponding information for the elsewhere range. It would be nice to have some cooperation from Elsewhere so that this isn’t necessary, otherwise some VOP writer will break the rules, resulting in code that is nowhere.

The Debug-Info and related structures are dumped by consing up the structure and making it be the value of a constant.

Getting the code vector and fixups dumped may be a bit more interesting. I guess we want a Dump-Code-Vector function which dumps the code and fixups accumulated by the current assembly, returning a magic object that will become the code vector when it is dumped as a constant. ]