Thoughts on composite constructors


Subject: Thoughts on composite constructors
From: Keunwoo Lee (klee@cs.washington.edu)
Date: Fri Jun 08 2001 - 20:47:23 PDT


======================================================================
THOUGHTS ON COMPOSITE CONSTRUCTORS

Some rough notes on how we can fix the current broken constructor
situation in Whirlwind/WIL. There's probably more to say, but I'm
getting hungry and I have to go home and pack the rest of my stuff.

----------------------------------------------------------------------
CURRENT STATE

Composite objects may be allocated either directly (in which case they
implicitly go in either the global or the stack scope and are passed
by value) or indirectly, with "new" (in which case they go on the heap
and have the expected semantics of pointers).

The problem with this is the same problem programmers have in C++,
i.e. the semantics of implicit copies are kind of painful. What
happens when we return a reference to a direct dynamic array from a
function? What happens when we assign a direct dynamic array to
another of a different size? None of it is very pretty.

The CFG/DFG generation of composite objects is also thoroughly wrong.
When you create a record in a record, e.g.:

   decl x :== { a :== { b :== c } };

the inner record constructor appears the CFG preceding the outer
record constructor, and a DFG edge flows from the inner record
constructor to the outer record constructor as an initial member
value. This is all wrong, because really we want there to be two
phases:

  + allocate entire record
  + initialize entire record, including possibly initializing members
of nested records

Also, there's the less central but rather annoying problem that C's
type checking cannot accommodate dynamically sized members at the end
of a record, as WIL can. This isn't a conceptual problem so much as
it's an annoyance, as we'll have to manually compute offsets and such
for anything that contains a dynamically sized component.

----------------------------------------------------------------------
PROPOSED SOLUTIONS

If I could start over, I would write constructors as follows:

+ All constructors, even integers, return values that are
references/pointers to the thing they construct. Hence, there is no
implicit copying.

+ Dereferencing is only defined on pointers to scalars.

+ Assignment is only defined on scalars.

+ Array indexing and record member lookup are defined to operate on
pointers, i.e. they are more like C's [] and -> operators.

+ Constructors are always allocated within a region. As syntactic
sugar, non-"new" allocation inside a function means allocation inside
the activation record of that function, and non-"new" allocation in
the global scope means allocation in the static area. Hence, every
constructor that comes out of the parser is explicitly annotated with
a region.
    The allocation region of a ConstructorNode is one of the node's
fields---it should probably be immutable, so that if you want to
change the allocation region you have to create a new constructor
node. We shouldn't have much call for changing these at first anyway.

+ Somewhere during compilation---probably during lowering---allocation
and initialization are explicitly split. From the constructors in a
function, we can naturally infer the allocations and initializations
that must happen when that function executes.
    So, for example, we would have a pass that traverses each
function, extracting all stack-allocated data for that function,
computing its size, and annotating the function entry point with
declarations for that stack allocated data. The constructor itself
would only do initialization of members.

+ The end of a function deletes its allocation region, so passing
references to "direct" arrays is not legal. We may want to have an
explicit representation in the IR for this.

+ Composite constructors should have only one node in the CFG, and all
incoming edges on the DFG should be flat. In other words, the
declaration above, which I repeat here:

   decl x :== { a :== { b :== c } };

the right-hand-side's profile in the CFG would be a single RecordNode.
Furthermore, the only incoming data flow edge should be c, flowing to
b. There should be some representation, within the RecordNode, of the
fact that this corresponds to the initialization of the b member of
the a member of the top-level record, but { b :== c } should not be a
separate data flow node with an incoming edge going to a---that would
mean "evaluate the inner record, then assign it to a", which is wrong.
    BTW, scalars-only assignment also means that, in the above:
    1. c must be a scalar, and
    2. The two :== in the above declaration are not the same thing.
       b :== c is an initializing assignment into a scalar component.
       a :== ... simply means that the a component is specified by the
record that follows---it does not imply a separate assignment step.
We may want, ideally, to make these two operators more distinct at the
source level as well.

----------------------------------------------------------------------
THE REGION FLOW GRAPH

Incidentally, I have been thinking a little about having a new slice
that represents flow of memory regions around the program. In this
representation, regions would actually be separate nodes, connected to
constructors and assignments by "region flow" edges.

An allocation statement would have, as one of its region flow
predecessors, a region (some regions could only go with certain kinds
of allocations---e.g., an activation record region can only go with a
stack allocation).

Also, right now we have lvalue predecessors in the data flow
graph---so, for example, in the following:

  (a[0])->x

we have a[0] as a DFG pred of (a[0])->x, which isn't really right
because no expression is getting evaluated. Really we want to say
that we're assigning into a memory location, and that the flow between
a[0] and a[0]->x is in the memory flow graph.

Anyway, this is something I have to think more about. It might be
more trouble than it's worth to implement, but it seems to represent
the domain more cleanly than just a DFG with two kinds of edges.

_______________________________________________
Cecil mailing list
Cecil@cs.washington.edu
http://majordomo.cs.washington.edu/mailman/listinfo/cecil



This archive was generated by hypermail 2b25 : Fri Jun 08 2001 - 20:48:04 PDT