Subject: Re: One more
From: Mark Seigle (seigle@cs.washington.edu)
Date: Sat Oct 06 2001 - 13:26:22 PDT
Ok, now I think that I understand what you're saying.
It seems that a lot of the interesting stuff with types comes about when
subtyping is introduced. One thing that you could do with regular
expression types is:
// using regexp shorthand
typedef Int (+|-)(0-9)(0-9)*
typedef Real (+|-)(0-9)(0-9)* | (+|-)(0-9)(0-9)*.(0-9)(0-9)*
Int addOne (Int i) { return I + 1; }
Real r = 3.1415936;
Int j = addOne (r);
In the above code, the the call to addOne should be legal because the
regular expression describing Real is a subtype of the regular expression
describing Int (clearly the set of objects described by type Real includes
the set of objects describing Ints).
In general, deciding the subtyping relation between two regular expression
types is EXPTIME-complete (definitely exponential as opposed to
NP-complete). In practice the complexity doesn't seem to be a problem for
some domains. This paper has a some of information on regular expression
types: http://www.cis.upenn.edu/~bcpierce/papers/regsub.ps
If you extend the type system to include context free grammar types, then
the subtyping relation is, in general, undecidable. It goes without
saying that for more complex grammars subtyping is still undecidable.
So what if we forget about subtyping? In a statically typed language,
without subtyping, then who cares what Int is typedefed to? It seems that
the typechecker could do it's job without interpreting the set of objects
that Int describes, i.e. subtyping is invariant and is based on the
lexical comparison of types.
The example above can probably be encoded without regular expression types
using classes and inheritance. Pierce's paper provides some examples that
don't seem to be encodable in other type systems.
Mark
On Sat, 6 Oct 2001, Andrei Alexandrescu wrote:
> The parameters themselves are data (characters). A type can be parameterized
> either by another type, or by data. In Literal's case, it is parameterized
> by data.
>
> Example: in C++ you can say:
>
> template <class T> class Something { ... };
>
> which means, Something is parameterized by a type, e.g. Something<int> makes
> sense. But you can also say:
>
> template <char c> class Literal { ... };
>
> In this case, c is a constant character. For example, Literal<'a'> makes
> sense. Moreover, Literal<'a'> and Literal<'b'> are two distinct types.
>
>
> Andrei
>
> ----- Original Message -----
> From: "Mark Seigle" <seigle@cs.washington.edu>
> To: "Andrei Alexandrescu" <andrei@cs.washington.edu>
> Sent: Saturday, October 06, 2001 12.31
> Subject: Re: One more
>
>
> >
> > Andrei,
> >
> > This is interesting, but I have a quick question:
> > Are the parameters to Literals[T] types themselves or are they data?
> >
> > Mark
> >
> > On Sat, 6 Oct 2001, Andrei Alexandrescu wrote:
> >
> > > Sorry for the spam, but something occured to me just now. I guess
> somehow
> > > I'm rehashing the obvious, but I'm not sure. So here goes.
> > >
> > > It occured to me that a regular expression is a type. That is, you can
> > > express any regular expression (and thus the associated grammar) as a
> type.
> > >
> > > Consider the following parameterized types. In parenthesis I specify the
> > > values or the types with which the types are parameterized (don't
> confuse
> > > those with value constructors, they parameterize the types, not the
> value of
> > > the type). They are all static subclasses of the class
> RegularExpression.
> > >
> > > Literal[char] (one class for each character)
> > > Range[char, char]
> > > Concat[RegularExpression, RegularExpression]
> > > Or[RegularExpression]
> > > Not[RegularExpression]
> > > Optional[RegularExpression]
> > > ZeroOrMore[RegularExpression]
> > > Optional
> > >
> > > Now, given these basic parameterized types, any regular expression is a
> type
> > > constructed with them. For example:
> > >
> > > * Simple real numbers: "[+\-][0-9].[0-9]" ->
> > > Concat[Concat[Concat[Or[Literal['+'], Literal['-']], Range['0', '9']],
> '.'],
> > > Range['0', '9']]
> > >
> > > The banana above is a type that contains all information about the
> regexp
> > > encoded inside of it. If it's a type, it means that type-specific
> anaylisis
> > > can be applied to it. In particular, building a lexer this way is
> different
> > > from the classic (automata-driven) solution. What's the advantage of
> doing
> > > that, I don't know. I'm sure the approach is sound, but what I don't
> know is
> > > whether it's interesting.
> > >
> > > Now context-free grammars are types as well. They can be built out of
> > > primitive types (the non-terminals). For example, "If" is a type
> > > parameterized with the test and two Statements (one of which can be the
> Nil
> > > type). The pl0 interpreter is built this way except that If is a class
> > > holding pointers to Condition, Statement and Statement, instead of being
> > > parameterized with them. In other words, the pl0 compiler is
> "dynamically
> > > typed" with respect to the grammar it implements.
> > >
> > > In fact, a program is a type. Think of the lexical tree as a big
> compound
> > > type built of all the types corresponding to various statements. The
> leaves
> > > are simple types; all others are parameterized types.
> > >
> > > Do I make any sense?
> > >
> > > I found it interesting that grammars and programs can be see as types,
> maybe
> > > there is something behind that. Let me know if I just made a fool of
> myself
> > > by restating something that's obvious and uninteresting.
> > >
> > >
> > > Andrei
> > >
> > > _______________________________________________
> > > Cecil mailing list
> > > Cecil@cs.washington.edu
> > > http://majordomo.cs.washington.edu/mailman/listinfo/cecil
> > >
> >
> >
>
>
_______________________________________________
Cecil mailing list
Cecil@cs.washington.edu
http://majordomo.cs.washington.edu/mailman/listinfo/cecil
This archive was generated by hypermail 2b25 : Sat Oct 06 2001 - 13:27:03 PDT