Factor/Dispatch ideas

A generic approach to handle generic methods, aka polymorphic calls, aka dynamic binding.

First, we can conceptually divide this operation into three primitive operations:

Computing the value
Lookup: signature->word
Executing the word

1. Computing the signature

This a simple way to generalize single dispatch, subjective dispatch, multiple dispatch and predicate dispatch.

Examples:

Single dispatch signature: [ drop drop class ]
Multiple dispatch signature: [ [ class ] tri@ ]
Subjective dispatch signature: [ access get ]
Predicate dispatch signature: [ 0 > ]

2. Lookup: signature->word

The heart of the dispatch. In essence, this can be conceptualized as lookup in an associative map from signatures to words. In practice, the input domain of signatures can be very large (ex: multiple dispatch, 1000 classes, 3 arguments: 1 billion signatures). However, we can take advantage of the fact that the important mappings (for performance) are a very small subset of the input domain.

note: dispatch is deterministic: As long as there are no reflective changes to the system, the same signature will always map to the same word.

0. Base dispatch

For correctness, we have to handle any input signature. For that purpose, we maintain a procedural dispatch function. It doesn't need to be fast: It might traverse deep hierarchies, do linear searches, even backtrack. In essence all other forms of dispatch are more or less elaborates memoizations of this function.

1. Inline caching (IC)

arithmetic cost: O(n) (where n is the size of the signature)
branch cost: one branch (to back out if the speculated signature is wrong)
memory cost: none (the data is in Icache)
polymorphism supported: very rare changes in the signature at the call site

One of the fastest ways to dispatch calls that are not really polymorphic at runtime. The 'cache' in an inline cache is the caller's call instruction: By redirecting (using self-modifying code) the call to different stubs, we can effectively maintain state in the caller. A given stub ensures that the current signature is identical to the cached one, and then calls the word mapped by this signature. If there is a mismatch, it falls back on base dispatch.

IC stubs can be shared between all call sites. IC stubs can easily be produced from a template (thus there is no intrinsic need for an online compiler). There is no point in producing as many IC stubs as there are potential targets (waste of time and space).

optimization: The IC stub is a natural target for customization, by inlining the call.

2. Polymorphic Inline Cache (PIC)

3. Global cache

4. Global lookup

3. Executing the word

This is trivial. A few things worth mentioning:

At this point, we assume the signature has been fully checked (there is no need for further checks)
optimization: We have the opportunity to do customization, that is, the produce a version of the word that is optimized for the signature
optimization: The next step is speculative inlining. This is how adaptive recompilation achieves high performance (see Self '93). However, speculative inlining can be done heuristically (like Self '92) or based on runtime profiling information (like Cecil), as we can expect many optimizations to be stable. In these cases, it's possible to compile ahead of time. Using heuristics can achieve impressive performance (the 'half the speed of C' benchmarks were with Self '92, not Self '93), but will very easily break down (trivial changes in the code can result in drastic runtime differences) and might require delayed code generation in order to produce an acceptable amount of code (hence requiring online compilation, thus defeating the purpose of compiling ahead of time). I will not give more details about speculative inlining as it's out of the scope of this discussion

This revision created on Wed, 22 Oct 2008 04:22:46 by prunedtree

Contents