39.1.2 Variable maps

There are about five things that the debugger might want to know about a variable:

The variable map should represent this information space-efficiently and with adequate computational efficiency.

The SC and ID can be represented as small integers. Although the ID can in principle be arbitrarily large, it should be $<$100 in practice. The location can be represented by just the offset (a moderately small integer), since the SB is implicit in the SC.

The lifetime info can be represented either as a bit-vector indexed by block numbers, or by a list of block numbers. Which is more compact depends both on the size of the component and on the number of blocks the variable is live in. In the limit of large component size, the sparse representation will be more compact, but it isn’t clear where this crossover occurs. Of course, it would be possible to use both representations, choosing the more compact one on a per-variable basis. Another interesting special case is when the variable is live in only one block: this may be common enough to be worth picking off, although it is probably rarer for named variables than for TNs in general.

If we dump the type, then a normal list-style type descriptor is fine: the space overhead is small, since the shareability is high.

We could probably save some space by cleverly representing the var-info as parallel vectors of different types, but this would be more painful in use. It seems better to just use a structure, encoding the unboxed fields in a fixnum. This way, we can pass around the structure in the debugger, perhaps even exporting it from the low-level debugger interface.

[### We need the save location too. This probably means that we need two slots of bits, since we need the save offset and save SC. Actually, we could let the save SC be implied by the normal SC, since at least currently, we always choose the same save SC for a given SC. But even so, we probably can’t fit all that stuff in one fixnum without squeezing a lot, so we might as well split and record both SCs.

In a localized packing scheme, we would have to dump a different var-info whenever either the main location or the save location changes. As a practical matter, the save location is less likely to change than the main location, and should never change without the main location changing.

One can conceive of localized packing schemes that do saving as a special case of localized packing. If we did this, then the concept of a save location might be eliminated, but this would require major changes in the IR2 representation for call and/or lifetime info. Probably we will want saving to continue to be somewhat magical.]

How about:

(defstruct var-info
  ;;
  ;; This variable's name. (symbol-name of the symbol)
  (name nil :type simple-string)
  ;;
  ;; The SC, ID and offset, encoded as bit-fields.
  (bits nil :type fixnum)
  ;;
  ;; The set of blocks this variable is live in.  If a bit-vector, then it has
  ;; a 1 when indexed by the number of a block that it is live in.  If an
  ;; I-vector, then it lists the live block numbers.  If a fixnum, then that is
  ;; the number of the sole live block.
  (lifetime nil :type (or vector fixnum))
  ;;
  ;; The variable's type, represented as list-style type descriptor.
  type)

Then the debug-info holds a simple-vector of all the var-info structures for that component. We might as well make it sorted alphabetically by name, so that we can binary-search to find the variable corresponding to a particular name.

We need to be able to translate PCs to block numbers. This can be done by an I-Vector in the component that contains the start location of each block. The block number is the index at which we find the correct PC range. This requires that we use an emit-order block numbering distinct from the IR2-Block-Number, but that isn’t any big deal. This seems space-expensive, but it isn’t too bad, since it would only be a fraction of the code size if the average block length is a few words or more.

An advantage of our per-block lifetime representation is that it directly supports keeping a variable in different locations when in different blocks, i.e. multi-location packing. We use a different var-info for each different packing, since the SC and offset are potentially different. The Name and ID are the same, representing the fact that it is the same variable. It is here that the ID is most significant, since the debugger could otherwise make same-name variables unique all by itself.