[haskell-beginners] Lifting over record syntax

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[haskell-beginners] Lifting over record syntax

AntC
musing on this
to which my reply talked mainly about Trex.

Does it make sense to regard a Trex record (or any free-standing anonymous/extensible record) as a container, with values 'lifted' into the structure, and an interface given by the label names, _not_ by order of fields? That is, "container" in the Monad sense?

Does it further make sense if all the values are already lifted into some Functor (like Maybe or (Either e) ) to hoist the Functor out over the record structure?

I'm not sure I'm up with the program for Functor/Applicative/Monad everywhere. If a record structure is a Functor, what's the type constructor? `Rec`? If record structure is an Applicative, what's the constructor for `pure` that lifts into `Rec`? The `<*>` aka `ap` or bind for a Monad presumably is to take two records (with disjoint labels) and concatenate them to a single record.

Given

data Person = Person (Rec( name :: String, age :: Int))

we want, corresponding to the question:

Person <?$>  (age = 27) <*> (name = "Joe")

in which operator <?$> is map-like: it takes a function to its left that expects a record of n fields, then maps over n singleton records with corresponding labels appearing in any order.

I might be able to get there by generics/reflection over the field names embedded in data constructor `Person`s type, then overloading <?$> to look for each label. That is, if Hugs had any sort of generics/TypeRep.

Label names in Trex are literals; there's no such thing as a label variable. (Which is why it's a tad annoying that they start lower case.) Furthermore the same label name must appear in both terms and types -- in fact labels occupy a namespace separate vs terms or types. So Trex is a long way from from generic record handling like:

recAppend :: ( rho'\__x ) => Rec rho' -> Rec ( __x :: a) -> Rec ( __x :: a | rho')
recAppend rho ( __x = x ) = ( __x = x | rho )

in which I've used double-underscore prefix to signify a label variable. This is intended to extend a record `rho` with a singleton record. If we try appending a record with more than one field, beware that field order is arbitrary, so this

recAppend rho ( __x = x, __y = y) = ...

has no principal type (a familiar difficulty). The programmer doesn't care which way round labels `__x, __y` bind, providing they're distinct, but the typing does care.

Some sort of generic record extend/concatenate would be great. You might, looking at Trex syntax, think that `|` is it. This is valid:

( x = "x" | (y = 'y' | (z = 3.14)))
( x = "x", y = 'y', z = 3.14 )           -- equivalent

I could put any record value in place of `(z = 3.14)`. But not in place of the `x = ..` or `y = ...`: `|` is not an operator, not commutative, not associative. Furthermore this is a place where parentheses make a difference, unlike usually in Haskell. So the following are not equivalent, indeed they're all invalid syntax

( (x = "x") | (y = 'y' | (z = 3.14)))       -- x = ... has parens
( x = "x"  | (y = 'y' | z = 3.14 ))          -- z = ... doesn't have parens
( x = "x" | y = 'y' | (z = 3.14) )           -- z = ... in parens OK, but y = ... is not
( (x = "x") | (y = 'y') | (z = 3.14) )      -- no chance

I'd like to write term `( rho1 | rho2 )` to concatenate two records. That's currently unrecognised syntax, so I think could be added as such. What would be its type/is it principal?

( rho1 | rho2 ) :: (rho1' \\ rho2') => Rec( rho1' | rho2' )    -- inventing more syntax

in which constraint `(rho1' \\ rho2')` requires the two rows' labels be mutually disjoint -- read "lacks all". Ur/web has something like this. Note I'm not envisaging `|` as a genuine operator: this is still hard-wired syntax; pipe is a reserved symbol in H98 anyway.


AntC

_______________________________________________
Hugs-Users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/hugs-users
Reply | Threaded
Open this post in threaded view
|

generic operations on Trex records [was: [haskell-beginners] Lifting over record syntax]

AntC
On Thu, 1 Nov 2018 at 12:31 AM, Anthony Clayden  wrote:


Label names in Trex are literals; there's no such thing as a label variable. (Which is why it's a tad annoying that they start lower case.) Furthermore the same label name must appear in both terms and types -- in fact labels occupy a namespace separate vs terms or types. So Trex is a long way from from generic record handling like:

recAppend :: ( rho'\__x ) => Rec rho' -> Rec ( __x :: a) -> Rec ( __x :: a | rho')
recAppend rho ( __x = x ) = ( __x = x | rho )

in which I've used double-underscore prefix to signify a label variable. This is intended to extend a record `rho` with a singleton record. If we try appending a record with more than one field, beware that field order is arbitrary, so this

recAppend rho ( __x = x, __y = y) = ...

has no principal type (a familiar difficulty). The programmer doesn't care which way round labels `__x, __y` bind, providing they're distinct, but the typing does care.

I've just come across Gaster's 1997 paper 'Polymorphic Extensible Records for Haskell', which gives a bit more detail on the internals of the implementation. No surprise: records are stored in a continuous block of memory, with fields left-to-right by alphabetic order of label. Then the lacks constraint for each label name is merely an index into the vector, and gets type-erased. So we could de-arbitrate field selection by taking them left-to-right. (And introduce the ordering on labels into the formal typing.)

I see in the Gaster+Jones 1996 paper, there is a possible generic approach: section 6.2 'First-class labels'. "This increases the expressiveness of our system quite dramatically, ..." There's to be a distinct kind for labels, and a type constructor Label :: label -> *. Presumably in 2018 we'd use kind Symbol for labels. So label `name` is a type variable(?) set to `Label "name"`. The paper doesn't seem to suggest any of the approach has been developed, nor can I see evidence of it inside the Hugs code. Specifically in the code, the parser puts all labels straight into a dedicated symbol table, where they're treated as constants. (They can only appear in very clearly identifiable syntactic contexts.) I'm not as sanguine as section 6.2: how to treat label variables as opposed to constants?

* How to treat pattern-matching on two distinct label variables? (example above)
* How to treat pattern-matching on one label variable appearing in distinct patterns? Such as

recProject (__x = x | rho1) (__x = _ | rho2) = (__x = x | (recProject rho1 rho2) )

This is currently a valid term, BTW, but is looking for label constant `__x` in both records.

This is trying to say: if the two records have some label in common (I don't care which), then take that label and its field; recurse on the leftovers of the two records. So this will project the LH record on labels in common with RH record. It's a great design pattern. But worse than the `recAppend` above, the compiler now has to pick arbitrarily a label that appears in both arguments; not arbitrarily left-to-right alphabetic ordering in just one record. (And if that's not possible, don't select that instance; there's presumably a base case instance for no labels in common.)

It's all fine and dandy using label variables in the formal semantics. It's quite different supporting them in surface syntax and typing.



I'd like to write term `( rho1 | rho2 )` to concatenate two records. That's currently unrecognised syntax, so I think could be added as such. What would be its type/is it principal?

( rho1 | rho2 ) :: (rho1' \\ rho2') => Rec( rho1' | rho2' )    -- inventing more syntax

in which constraint `(rho1' \\ rho2')` requires the two rows' labels be mutually disjoint -- read "lacks all". Ur/web has something like this.

Also Harper&Pierce 1990, 1991 referenced in the G+J paper. We can define "lacks all" in terms of lacks, by extending the G+J Figure 1 definition of `\` thusly

* base case: two EmptyRecs lack all labels in common
* recursive case: given two recs that lack all plus some label that both lack, adding that label at some type to one record, they still both lack all.

Symbolically:

P ||- {| |} \\ {| |} ;

P ||- r1 \\ r2, r1\l, r2\l
---------------------
P ||- {| l :: tau | r1 |} \\ r2

Note I'm not envisaging `|` as a genuine operator: this is still hard-wired syntax; pipe is a reserved symbol in H98 anyway.


Specifying its formal semantics does need distinguishing one of the arguments being empty vs not (I'll give the term-level definition as if it were possible, these need to be in distinct Instances):

(EmptyRec | rho2) = rho2
( (__x = x | rho1') | rho2) = (__x = x | (rho1' | rho2) )

Then to achieve full expressivity for a record system, I need just one more operation: label subtraction or "project away" also known in relational algebra as "remove" or "project ALL BUT", sometimes symbolised as pi-hat or pi-overbar. Semantics: given two records with possibly some labels in common, return the LH argument, with all in-common labels/fields removed. Note this cunningly avoids requiring the in-common labels to have same field type, because it ignores the RH argument's fields. The result "lacks all" attributes of the RH arg.

I think "project away"s semantics can be defined without requiring pattern-matching on the same label in two different records (and using `` to symbolise the operation -- closest I can get to pi-hat):

EmptyRec  rho2 = EmptyRec
(__x = x | rho1')  rho2 = (__x = x | (rho1'  rho2)), if rho2\__x
(__x = x | rho1')  rho2 =  (rho1'  rho2),  otherwise

Then other generic operations on whole-records can be defined in terms of the above two, treating records as sets of label-field pairs, and applying set operations:

rho1 `recIntersect` rho2 = (rho1  (rho1  rho2))     -- aka projection
rho1 `recUnion` rho2 = Just ((rho1  rho2) | (rho1  (rho1  rho2)) | (rho2  rho1))
-- providing (rho1  (rho1  rho2)) == (rho2  (rho2  rho1))
-- otherwise Nothing

`recUnion` is more familiar as record-level relational Natural Join. Returns a Maybe record, because the arguments might not be equal on the field values of the labels in common.

So we have set difference, intersection, union. That must be expressively complete.


AntC

_______________________________________________
Hugs-Users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/hugs-users