Annotations for Run-Time Value-Specific Optimization Brian Grant, Markus Mock, Matthai Philipose, Craig Chambers, Susan Eggers -- Introduction -- Run-time VSO / Dynamic compilation -- Steps towards making DC mainstream -- Identify applications that benefit from DC -- Identify VSOs could apply to those programs -- Identify primitives to compactly/naturally describe the optimizations -- Design a system to perform the optimizations -- Design an analysis to invoke the primitives automatically -- We've identified the apps, the opts., and the primitives and are working on the system -- Automation is not yet possible, so annotations -- Imperative vs. declarative -- Declarative easier -- Imperative more expressive -- Our declarative annotation is as expressive, even more than -- PE -- `C -- makeStatic -- what to specialize by annotating variables -- how to specialize by specifying policies -- paper outline -- Capabilties Needed for Effective DC -- Concepts -- Static/dynamic division of values and operations -- Dynamic regions Note examples in this section are pre-annotated with greyed annotations -- Degree of specialization -- Polyvariant specialization -- simple discordant-merge (example) -- multi-way unrolling (byte-code interp.) -- can be used to generate an arbritrary graph -- view as interpretation on nodes of a graph -- traversing edges between nodes using variable to be kept constant (node = node->next) creates a branch to the code for that node -- Polyvariant division -- conditional specialization (example) -- Interprocedural specialization -- (example?) -- Laziness -- static promotion example w/ multi-way (refer to computed jump in interp. example) -- nested regions (compelling example???) -- Performance vs. safety tradeoffs -- Lazy branches and replication -- Exception behavior (example) -- Termination of unrolling (example) -- Cacheing (cache simulator) -- may not want unlimited versions -- may not want to check -- one-way unrolling -- real run-time constants -- Pointer transitivity (above cache simulator) -- derived values aren't in the context and so aren't checked -- The Annotations (short) -- Our annotations map onto the above capabilities -- Look at the examples above again -- What to specialize (table) -- makeStatic -- makeDynamic -- Policies (table with discussions of semantics) -- monovariant, polyvariant, promoteStatic -- interprocedural? -- separate compilation: how to get info -- function pointers: we must be conservative -- eager, loopLazy, polyLazy, lazy -- these only affect dynamic decisions -- cacheN, cache1, replicate, unchecked -- pointer transitivity? -- Pointwise policies overriders (table) -- Doesn't make sense for polyvariance or cacheing -- Safety/performance -- makeEager, makeLazy on decisons -- makeStatic, makeDynamic on dereferences? -- Interprocedural -- generalize -- specialize (on currently static args) -- memoize -- inline -- Applications (no change) -- discuss types of applications -- graphics -- database queries -- neural networks -- simulators: circuits, hardware, caches -- interpreters: java-vm -- pattern matching (computational biology?) -- features to show off: --Lazy stitching <---8086 simulator (loop with indirect jump) --Multi-way unrolling <--Java interpreter --Conditional specialization <--Read? --Run-time inlining <--('C cspec) Function composition --Caching --cache <--- --cache1 <--- matrix/scalar --replicate <--- cache example from paper --unchecked <--- cache example from paper -- Comparisons with Other Systems -- PE -- PE: static, source-to-source translation, functional languages, self application, fully automatic given root static/dynamic division -- Us: dynamic, compiler, C, rely on annotations to direct specialization -- => Different concerns -- Termination -- eager is no worse than PE (which is 100% eager) -- role of automatic replication decisions -- unrolling -- inlining -- loopLazy guarantees termination -- Division (D), specialization (S), execution (E) -- Off-line: (D)(S)(E) -- On-line: (DS)(E) -- Run-time: (D)(SE) (Fabius, Tempo, us) -- Compiler: (DSE) -- We do static polyvariant division and dynamic polyvariant specialization -- PE systems don't do intraprocedural polyvariance because they rely on function inlining and specialization instead -- functional languages, CPS -- example of multi-way with tail call -- Fabius in this boat -- Tempo is also function granularity -- unrolling? -- unnecessary code replication compared to on-line -- we have conditional specialization -- interesting new capabilities due to laziness -- Converting dynamic values to static ones: laziness -- PE can't do this at (static) specialization time -- we don't start with a single root set of static values, but create static values as needed -- static heap-allocated structures -- dynamic values don't stop specialization altogether, only eager specialization (static promotion) -- we don't need "the trick", which can't be applied at run-time anyway -- nested and overlapping dynamic regions -- Fabius, Tempo don't have this -- Exception behavior -- Fabius, Tempo don't have this -- `C -- Capabilities -- Static promotion -- Nested regions -- SESE `() expressions & forward jumps / early exits -- Can't generate arbitrary graphs -- Can't compile Java bytecodes!!! -- Respecialization -- `C has function granularity -- Usability -- Automatic generation of specialized code -- Automatic derviation of static values -- Automatic cacheing/versioning/checking -- Automatic respecialization -- Example to demonstrate it -- Optimizability -- `() expressions are broken up -- Can't use existing compiler framework -- Have to reason about code manipulating code -- Implementation: show that these annotations are implementable (no change) -- BTA -- Propagation of iconst, kConst, cache, lazy -- Reachability conditions: simpler than theorem proving -- Lazy branches -- Discordant merges -- Context -- Liveness analysis to adjust context -- we should do it Important: -- Conversion of high-level annotations to low-level ones that are propagated to every program point by an idfa. -- A BTA computes derived static values (with reachability analysis, as before) -- Cache & Lazy/replication policies are propagated with keepConst info. Precedence rules determine which policy is in effect for a piece of code. -- Context is used to find versions in the cache. -- Need the liveness analysis to ensure correctness of the context or kill all derived rtconsts at unkeepConst -- Building a specialized dynamic compiler -- unit partitioning -- why -- when -- static/dynamic separation -- unit -> DAG conversion -- why -- splitting & merging -- insertion of static ops, branch patching -- insertion of cache lookups -- for different policies -- eager vs. lazy units -- what is a cache lookup -- for different policies -- eager vs. lazy -- specialized context switches -- hierarchical cache -- caches code -- context diff. optimization -- specialized-value table -- Interprocedural -- not implemented yet -- extension of intraprocedural framework -- cite PE, procedure cloning, and interprocedural analysis for C -- Conclusion -- Right intermediate language for automatically annotating code to specialize -- Automatically selecting dynamic regions, choosing rtconsts, and inserting keepConsts