Annotation-Directed Run-Time Specialization in C Brian Grant, Markus Mock, Matthai Philipose, Craig Chambers, Susan Eggers The first thing we have to ask ourselves is "what is our contribution?" Possibilities: -- Primitives for describing run-time specialization -- makeStatic -- makeDynamic -- The concept of dynamic-to-static promotion (dropping?), including dynamic re-specialization, nested and overlapping dynamic regions, automatic dynamic-to-static promotion, and associated caching -- The idea of associating specialization policies with static variables -- Conditional specialization (though Consel & Danvy mentioned conditionally static values, it wasn't intended as a means to control specialization) -- Lazy (or incremental) specialization (all run-time specialization is, in a sense, lazy but we can use laziness to ensure safety, reduce specialization cost, etc.) -- The BTA and run-time-specialization algorithm -- Unit optimizations (similar to identification of specialization points or configuration analysis?) I don't think that the intraprocedural/subfunction/direct-style nature of our approach is a contribution, but rather a requirement. -- Introduction -- Motivating example: stack-based byte-code interpreter -- Multi-way unrolling -- Automatic dynamic-to-static promotion -- Conditional specialization -- The Run-Time Specializer -- Novel features -- Dynamic-to-static promotion -- Dynamic re-specialization -- Lazy/incremental specialization -- Unit-based -- Simplifications -- Intraprocedural -- Though "constant" functions can be invoked -- Can they contain dynamic regions? -- No generalize decisions made -- Algorithm (see specializer.txt) -- The Annotations -- makeStatic -- makeDynamic -- Function annotations -- Specialization policy -- monoSpecialize -- polySpecialize -- Division policy -- monoDivide -- polyDivide -- Promotion policy -- manualPromote -- autoPromote -- Possible 3rd policy for promoting only at must defs -- Laziness policy -- eager -- loopLazy -- polyLazy -- lazy -- Cache policy -- CacheAll, Cache1, Replicate, Unchecked -- Could want different policies at different places -- promotions & demotions -- split discordant merges -- Other possibilities -- Cache(n) -- User-defined replacement policies -- Benefits -- Easy to use -- Turn them off for debugging -- Lots of control -- Good for experimentation -- Target for fully automated system -- BTA -- Prepass -- Compute use-def chains -- Compute live ranges and sums of all live ranges -- Find apparent boundaries of the dynamic region -- Insert makeStatics and makeDynamics -- Polyvariant specialization -- Polyvariant division -- Automatic d->s promotion -- IDFA over CFG -- Flow functions -- Keep track of root variables of derived temps -- Eliminate redundant makeStatics and makeDynamics -- Output -- Static/dynamic variables w/ cache policies -- D->S promoted variables -- Discordant variables -- Context sets -- Eager/lazy edges -- GEgen -- (Sub-)basic blocks as units -- Split merges with multiple divisions -- Partition basic blocks at specialization points -- Context changes -- Dynamic-to-static promotions -- Insert calls to the specializer at d->s promo. pts. outside the dynamic region -- Bigger units -- Unit identification and graph partitioning -- New units at specialization points -- Context changes -- Dynamic-to-static promotions -- Discordant merges -- New units at merges with any lazy incoming edges -- New units at merges with predecessors from different units (for single-entry) -- Clustering! -- DAG construction via splitting & merging -- Want to specialize all blocks of a unit that could potentially be reached => pursue all paths under dynamic control -- Want to know order of traversal of the blocks at compile time -- Better optimization of static code -- Eliminate run-time work-list overhead -- So, build a DAG of blocks reached -- Difficult because of -- Unstructured code -- Combinations of static and dynamic branches -- Loops / back edges -- Solution -- Pursue all static paths (static paths are determined by which static branches are taken) -- Copy blocks when they are reached in different orders on different static paths -- Algorithm - should we put the full algorithms in the appendix given that they are untested? how much detail in the body? -- Splitting -- Split all static paths -- Add nodes to a given static path as they are reached; topologically order nodes on different dynamic paths -- Merging Merge the tails of branches of the tree created by splitting -- Create a dummy node and link it to all of the leaves of the unit's tree -- Add the sink to a work list -- Until the work list is empty, remove the head of the list. Compare the sources of all incoming edges into the node. If two source nodes represent the same block and they have the same successors in this graph, then merge the two nodes and add the merged node to the work list if it isn't there already. -- Eliminate unit boundaries for eager successors with single run-time predecessors, and for those with multiple predecessors but the replicate policy, statically copying code for those with multiple compile-time predecessors -- Eliminate cache lookups for lazy successors with no dynamic-to-static promotions and single run-time predecessors or multiple predecessors and the replicate policy -- What about unchecked eager promotions for static pointer derefs? -- Further optimizations to the run-time specializer -- In-line and specialize the run-time primitives -- Everything in the Unit structures is known at compile time -- Recursion is implemented with a stack and jump (tail recursion!) -- Eliminate stack manipulation for all first eager successors -- Saving/restoring of state and argument passing can be specialized in many cases as well (e.g., at lazy callbacks) -- We don't explicitly pass around a set of all live static variables; we load/store them from/to the cache -- Hybrid caching scheme? -- Use hierarchy when adding to the context -- Reset to top of hierarchy when remove from context -- Related Work -- Tempo (Consel) -- Fabius (Leone, Lee) -- `C (Engler, Kaashoek) -- Cmix (Andersen) -- Schism (Consel) -- Conclusions -- Status -- Future work -- Alias analyis -- Interprocedural analysis -- Interprocedural specializer -- Overcome required laziness at function boundaries -- Better handling of side effects -- Different cache organizations (e.g., hierarchical) -- Cache interface -- Replacement policies? -- Information for making specialization decisions -- SPIN: must be able to reclaim space -- Don't know how important it will be -- Add a run-time optimizer to the run-time specializer -- Register allocation -- Instruction scheduling -- Won't always be profitable => optimization policy