Annotations for Run-Time Value-Specific Optimization
Brian Grant, Markus Mock, Matthai Philipose, Craig Chambers, Susan Eggers

-- Introduction
	-- Run-time VSO / Dynamic compilation
	-- Steps towards making DC mainstream
		-- Identify applications that benefit from DC
		-- Identify VSOs could apply to those programs
		-- Identify primitives to compactly/naturally describe
		   the optimizations
		-- Design a system to perform the optimizations
		-- Design an analysis to invoke the primitives
		   automatically
	-- We've identified the apps, the opts., and the primitives
	   and are working on the system
	-- Automation is not yet possible, so annotations
	-- Imperative vs. declarative
		-- Declarative easier
		-- Imperative more expressive
	-- Our declarative annotation is as expressive, even more than
		-- PE
		-- `C
	-- makeStatic
		-- what to specialize by annotating variables
		-- how to specialize by specifying policies
	-- paper outline

-- Capabilties Needed for Effective DC
	-- Concepts
		-- Static/dynamic division of values and operations
		-- Dynamic regions
	   Note examples in this section are pre-annotated with greyed
	   annotations
	-- Degree of specialization
		-- Polyvariant specialization
			-- simple discordant-merge (example)
			-- multi-way unrolling (byte-code interp.)
				-- can be used to generate an
				   arbritrary graph
					-- view as interpretation on
					   nodes of a graph 
					-- traversing edges between nodes using
					   variable to be kept constant (node =
					   node->next) creates a branch to the 
					   code for that node
		-- Polyvariant division
			-- conditional specialization (example)
		-- Interprocedural specialization
			-- (example?)
		-- Laziness
			-- static promotion example w/ multi-way
			   (refer to computed jump in interp. example)
			-- nested regions (compelling example???)
	-- Performance vs. safety tradeoffs
		-- Lazy branches and replication
			-- Exception behavior (example)
			-- Termination of unrolling (example)
		-- Cacheing (cache simulator)
			-- may not want unlimited versions
			-- may not want to check
				-- one-way unrolling
				-- real run-time constants
		-- Pointer transitivity (above cache simulator)
			-- derived values aren't in the context and so
			   aren't checked

-- The Annotations (short)
	-- Our annotations map onto the above capabilities
	-- Look at the examples above again
	-- What to specialize (table)
		-- makeStatic
		-- makeDynamic
	-- Policies (table with discussions of semantics)
		-- monovariant, polyvariant, promoteStatic
		-- interprocedural?
			-- separate compilation: how to get info
			-- function pointers: we must be conservative
		-- eager, loopLazy, polyLazy, lazy
			-- these only affect dynamic decisions
		-- cacheN, cache1, replicate, unchecked
		-- pointer transitivity?
	-- Pointwise policies overriders (table)
		-- Doesn't make sense for polyvariance or cacheing
		-- Safety/performance
			-- makeEager, makeLazy on decisons
			-- makeStatic, makeDynamic on dereferences?
		-- Interprocedural
			-- generalize
			-- specialize (on currently static args)
			-- memoize
			-- inline

-- Applications
	(no change)
	-- discuss types of applications
		-- graphics
		-- database queries
		-- neural networks
		-- simulators: circuits, hardware, caches
		-- interpreters: java-vm
		-- pattern matching (computational biology?)
	-- features to show off:
		--Lazy stitching   <---8086 simulator (loop with indirect jump)
		--Multi-way unrolling <--Java interpreter
		--Conditional specialization <--Read?
		--Run-time inlining  <--('C cspec) Function composition
		--Caching
			--cache     <--- <no example needed; should be common case>
			--cache1    <--- matrix/scalar
			--replicate <--- cache example from paper
			--unchecked <--- cache example from paper

-- Comparisons with Other Systems
	-- PE
		-- PE: static, source-to-source translation, functional
		   languages, self application, fully automatic given
		   root static/dynamic division
		-- Us: dynamic, compiler, C, rely on annotations to
		   direct specialization 
			-- => Different concerns
		-- Termination
			-- eager is no worse than PE
			   (which is 100% eager)
				-- role of automatic replication decisions
					-- unrolling
					-- inlining
			-- loopLazy guarantees termination
		-- Division (D), specialization (S), execution (E)
			-- Off-line: (D)(S)(E)
			-- On-line:  (DS)(E)
			-- Run-time: (D)(SE) (Fabius, Tempo, us)
			-- Compiler: (DSE)
		-- We do static polyvariant division and dynamic
		   polyvariant specialization
			-- PE systems don't do intraprocedural
			   polyvariance because they rely on function
			   inlining and specialization instead
				-- functional languages, CPS
				-- example of multi-way with tail call
				-- Fabius in this boat
				-- Tempo is also function granularity
					-- unrolling?
			-- unnecessary code replication compared to on-line
			-- we have conditional specialization
			-- interesting new capabilities due to laziness
		-- Converting dynamic values to static ones: laziness
			-- PE can't do this at (static) specialization time
			-- we don't start with a single root set of
			   static values, but create static values as needed
			-- static heap-allocated structures
			-- dynamic values don't stop specialization
			   altogether, only eager specialization
			   (static promotion)
			-- we don't need "the trick", which can't be
			   applied at run-time anyway
			-- nested and overlapping dynamic regions
				-- Fabius, Tempo don't have this
			-- Exception behavior
				-- Fabius, Tempo don't have this
	-- `C
		-- Capabilities
			-- Static promotion
			-- Nested regions
			-- SESE `() expressions & forward jumps /
			   early exits
				-- Can't generate arbitrary graphs
				-- Can't compile Java bytecodes!!!
			-- Respecialization
				-- `C has function granularity
		-- Usability
			-- Automatic generation of specialized code
			-- Automatic derviation of static values
			-- Automatic cacheing/versioning/checking
			-- Automatic respecialization
			-- Example to demonstrate it
		-- Optimizability
			-- `() expressions are broken up
			-- Can't use existing compiler framework
			-- Have to reason about code manipulating code

-- Implementation: show that these annotations are implementable
	(no change)
	-- BTA
		-- Propagation of iconst, kConst, cache, lazy
		-- Reachability conditions: simpler than theorem proving
		-- Lazy branches
		-- Discordant merges
		-- Context
		-- Liveness analysis to adjust context -- we should do it

		Important:
		-- Conversion of high-level annotations to low-level ones that
		   are propagated to every program point by an idfa.
		-- A BTA computes derived static values (with reachability analysis,
		   as before)
		-- Cache & Lazy/replication policies are propagated with keepConst
		   info.  Precedence rules determine which policy is in effect for
		   a piece of code.
		-- Context is used to find versions in the cache.
		-- Need the liveness analysis to ensure correctness of the context
		   or kill all derived rtconsts at unkeepConst

	-- Building a specialized dynamic compiler
		-- unit partitioning
			-- why
			-- when
		-- static/dynamic separation
		-- unit -> DAG conversion
			-- why
			-- splitting & merging
		-- insertion of static ops, branch patching
		-- insertion of cache lookups
			-- for different policies
			-- eager vs. lazy units
		-- what is a cache lookup
			-- for different policies
			-- eager vs. lazy
			-- specialized context switches
		-- hierarchical cache
			-- caches code
			-- context diff. optimization
			-- specialized-value table

	-- Interprocedural
		-- not implemented yet
		-- extension of intraprocedural framework
		-- cite PE, procedure cloning, and interprocedural
		   analysis for C

-- Conclusion
	-- Right intermediate language for automatically annotating
	   code to specialize
	-- Automatically selecting dynamic regions, choosing rtconsts,
	   and inserting keepConsts