* * * * *

                           Musings on typechecking

I was thinking a bit more about yesterdays post on variable types and
character sets [1]. I mean, yes, one could conceivably make subtypes of
strings with specific character sets:

> class String { /* ... */ } ;
> class StringISO8859d1 : public String { /* ... */ } ;
> class StringISO8859d5 : public String { /* ... */ } ;
> class StringUTFd8     : public String { /* ... */ } ;
>
> String          foo = "Hello";        // perhaps a warning here
> StringISO8859d1 bar = "Hello";
> StringISOUTFd8  baz = "Wal★Mart";
>
> bar = baz;    // two things could happen here
>               // 1. compiler spits out a warning
>               // or error because you are trying
>               // to store a variable of one type
>               // into another without an explicit
>               // conversion step, or
>               // 2. The compiler does the conversion
>               // for you, much like an int → double
>               // conversion.  Problem here is what if the
>               // source string has characters not
>               // representable in the destination string.
>

But that doesn't work that well when you want to determine if the string has
been converted to HTML (HyperText Markup Language) entities or not, as it
gets unwieldy fast:

> StringISO8859d1NoEntities foo;
> StringISO8859d1Entities   bar;
> StringISO8859d5NoEntities baz;
> StringUTFd8Entities       fubar; // Silliness!  Silliness I say!
>

And even more importantly, what if you don't know the character set of the
data until runtime? In that sense, the character set and the encoding is a
type of attribute of the string, or a sub-type of the string.

Yes, such information could be added to the base String class with
appropriate methods to check and set this sub-type (or attribute-like)
information, but I'd still like to get compile time checks whenever and where
ever I can. For instance:

> StringEntity   foo = "Johnson & Co.";
> StringNoEntity bar = "American Telegraph & Telephone";
> String         baz;
>
> baz = foo + bar;      // now what?  We're adding a string where
>                       // "&" is encoded as "&" to a string
>                       // where the "&" appears as is.  Do we
>                       // 'de-entify' foo or 'entify' bar?  And
>                       // what is the programer expectation?
>                       // Probably not much, given this piece
>                       // of code.
>

I hope you can see where I'm trying to go with this, and track a form of
intent throughout the code. Variable types are more than just annoying
muddled headed programmers and producing fast code—it's also a statement of
what we (or at least I, as a programmer) can do to the variable—what methods
of manipulation we want done and having the computer find that at compile
time where it's certainly cheaper and easier to fix than at the customer
site.

I think what I'm aiming for is a way of annotating a variable with more than
just type information and having those annotations checked and enforced by
the compiler. Another example might be unit tracking. Say, making sure that
you add a LENGTH to a LENGTH but that a LENGTH times a LENGTH is an AREA, and
you can't add LENGTH to an AREA. Or that this variable is in inches, that in
millimeters, and have the computer keep track of multiplying or dividing by
25.4 as the value moves from one variable to another (I think there are some
specialized computer languages that can do this, but I don't know of any
general computer language supporting this).

[1] gopher://gopher.conman.org/0Phlog:2006/08/08.1

Email author at [email protected]