Edison 1.2rc2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm
Greetings!


The second release candidate for Edison 1.2 is now ready for  
comments.  The notable changes from RC1 are:

   * add strict variants of all folds and reduces
   * reverse argument orders to 'rcons' and 'lookup*' in
     Sequence
   * Add symbolic operators for lcons, rcons, append, and lookup
     from the Sequence API
   * Add symbolic operators for subsetEq, difference, intersection
     and union from the set API
   * rename 'single' to 'singleton' in all APIs
   * rename 'intersect' to 'intersection' in Collection and
     Associated Collection APIs
   * add 'adjustOrInsert' to the Associated Collection API
      (see http://www.haskell.org//pipermail/libraries/2006-February/ 
004952.html for a use case)


And the following are the remaining major issues I'd like to get  
feedback about:

-- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set  
has operations with the
    same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am  
considering adoping the
    Data.Set names, but they are quite a bit longer.  Also what about  
the meaning of "subset"
    vs "proper subset"?  What do you find most intuitive?
-- Data.Map defines 'isSubmapOfBy' and 'isProperSubmapBy'.  Edison  
has no corresponding
    API methods.  Are they important enough to add?
-- What do you think about the symbolic operators?  Is the fact that  
(++) clashes with the Prelude a problem?
    Should the operators be folded into the "Data.Edison" module  
instead?
     (see http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison- 
Sym.html)


In addition, I'm interested in any API related feedback you might have.



Code and docs can be found at the project website:
http://www.eecs.tufts.edu/~rdocki01/edison.html

Or you can go directly to the haddock docs:
http://www.eecs.tufts.edu/~rdocki01/docs/edison/index.html

Or the darcs repo:
http://www.eecs.tufts.edu/~rdocki01/edison/



Thanks,
Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Ben Franksen-2
> -- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set
> has operations with the
>     same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am
> considering adoping the
>     Data.Set names, but they are quite a bit longer.  Also what about
> the meaning of "subset"
>     vs "proper subset"?  What do you find most intuitive?

I prefer the mathematical convention: subset includes equality, proper
subset excludes it. And proper subset can very easily defined by
library user (just check isSubset and sizes differ) so is not
essential.

The 'is' prefix is good because it makes the meaning clearer. The 'Of'
suffix is bad, because it doesn't and adds noise.

> -- Data.Map defines 'isSubmapOfBy' and 'isProperSubmapBy'.  Edison
> has no corresponding
>     API methods.  Are they important enough to add?

Can they be easily added by the user w/o loosing much efficiency? Then
no. Else yes.

Note that isSubmapOfBy and isProperSubmapOfBy are the only 'By'-suffix
functions exported from Data.Map.

Ben
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

David Menendez
In reply to this post by Bugzilla from robdockins@fastmail.fm
Robert Dockins writes:

> The second release candidate for Edison 1.2 is now ready for  
> comments.  The notable changes from RC1 are:
>
>    * add strict variants of all folds and reduces

You know, I was looking through the code last night and thought, "what
about foldl'?" Funny.

That being said, eight variants of fold* (plus six variants of reduce*)
feels excessive to me. I can understand the desire for generality, but
I'm not sure it's possible to decide between, say, foldr and foldr'
without knowing the sequence type.

As I understand it, with regular lists you almost always[*] want either
foldr or foldl', because foldl builds up a chain of unevaluated thunks
and foldr' has linear call stack growth. (When I was learning ML, we
were told to always prefer foldl over foldr for that reason.)

With reverse lists (i.e., Rev []), the situation is reversed: foldr' is
tail-recursive, foldl is fully lazy, and foldr and foldl' have linear
growth.

For the other sequences, there may not be clear winners. With
SimpleQueue, for example, the choice between foldr and foldr' might
depend on how the queue is arranged.

Maybe it would be better to have each implementation default to the
strictness that's most likely best, at least in the generic Seq
interface. That way, [] would get strict foldl and non-strict foldr, but
Rev [] would get strict foldr and non-strict foldl. Individual modules
could provide extra variants as desired outside the class interface.


[*] I guess you might want a non-strict foldl with lists if the foldee
is expensive enough that building up a pile of thunks is worth it if
they can potentially be skipped.

> -- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set  
> has operations with the
>     same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am  
> considering adoping the
>     Data.Set names, but they are quite a bit longer.  Also what about  
> the meaning of "subset"
>     vs "proper subset"?  What do you find most intuitive?

I would go with subset and properSubset. The "isSubsetOf" form only
makes sense if used in-line, but we have (|=) for that.

> In addition, I'm interested in any API related feedback you might
> have.

Is there a reason why the TernaryTrie and the PatriciaLoMap don't
implement OrdAssoc?

For the Seq class, I'd rather have the O(n) stuff listed with the
implementations, rather than having a default in the interface.

Finally, thanks for reviving Edison. I had always been disappointed by
its obscurity.
--
David Menendez <[hidden email]> | "In this house, we obey the laws
<http://www.eyrie.org/~zednenem>      |        of thermodynamics!"
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm
On Friday 03 March 2006 08:28 pm, you wrote:

> Robert Dockins writes:
> > The second release candidate for Edison 1.2 is now ready for
> > comments.  The notable changes from RC1 are:
> >
> >    * add strict variants of all folds and reduces
>
> You know, I was looking through the code last night and thought, "what
> about foldl'?" Funny.
>
> That being said, eight variants of fold* (plus six variants of reduce*)
> feels excessive to me. I can understand the desire for generality, but
> I'm not sure it's possible to decide between, say, foldr and foldr'
> without knowing the sequence type.

Well, that's certainly an issue I considered when adding them to the API.

> As I understand it, with regular lists you almost always[*] want either
> foldr or foldl', because foldl builds up a chain of unevaluated thunks
> and foldr' has linear call stack growth. (When I was learning ML, we
> were told to always prefer foldl over foldr for that reason.)
>
> With reverse lists (i.e., Rev []), the situation is reversed: foldr' is
> tail-recursive, foldl is fully lazy, and foldr and foldl' have linear
> growth.

I toyed for a good while with only defining some of the strict folds, or doing
some other halfway measure.  The thing that eventually made be decide to
throw in the whole kitchen sink (all the strict folds) was the realization
that the API is specified at a purely semantic level; at that level, the only
thing that matters about strict folds is that they have exactly the same
semantics as the non-strict fold given a combining function which is strict
in the correct argument(s) (otherwise they are guaranteed to be the same
except that the strict fold may diverge in cases where the non-strict fold
does not).  When it comes time to implement, the strict folds give you more
leway to force closures early BUT YOU DON'T HAVE TO!  For the regular list
example, I could define foldr' = foldr and it would be correct according to
the specification.  For now I have defined all the implementations to be as
strict as possible in all cases, but I can make them lazier later if it turns
out to have better space behavior; as you mention, foldr' on regular lists is
problematic and it will probably be the first thing to be reverted to a lazy
implementation.

> For the other sequences, there may not be clear winners. With
> SimpleQueue, for example, the choice between foldr and foldr' might
> depend on how the queue is arranged.
>
> Maybe it would be better to have each implementation default to the
> strictness that's most likely best, at least in the generic Seq
> interface. That way, [] would get strict foldl and non-strict foldr, but
> Rev [] would get strict foldr and non-strict foldl. Individual modules
> could provide extra variants as desired outside the class interface.

Both the collection and the associated collection have fold (and fold') which
folds over elements in an unspecified order.  Perhaps I could add that to the
sequence API to give the "best behavior" choice to the sequence implementer.  
Then for lists you would have (lazy) fold = foldr and (strict) fold' =
foldl'.

> [*] I guess you might want a non-strict foldl with lists if the foldee
> is expensive enough that building up a pile of thunks is worth it if
> they can potentially be skipped.

> > -- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set
> > has operations with the
> >     same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am
> > considering adoping the
> >     Data.Set names, but they are quite a bit longer.  Also what about
> > the meaning of "subset"
> >     vs "proper subset"?  What do you find most intuitive?
>
> I would go with subset and properSubset. The "isSubsetOf" form only
> makes sense if used in-line, but we have (|=) for that.

I'm leaning in this direction as well; I think I've pretty well decided the
names as they stand now have to go (they contradict standard usage), and I
don't really like the Data.Map names.

> > In addition, I'm interested in any API related feedback you might
> > have.
>
> Is there a reason why the TernaryTrie and the PatriciaLoMap don't
> implement OrdAssoc?

Not really.  When I inherited Edison, none of the associated collections
implemented the Ord* classes (I don't know why).  I added AssocList and
StandardMap because they were easy.  The others are significantly more
complicated and I thought it was more important to work out the API issues
and kick 1.2 out the door before tacking this (same reason the 'strict' folds
for TernaryTrie are really just the non-strict folds).

> For the Seq class, I'd rather have the O(n) stuff listed with the
> implementations, rather than having a default in the interface.

Noted.  Its a dreadfully tedious cut-n-paste job which is why I haven't done
it; patches and clever scripts to do this are welcome ;-)  Of course, it
would be nice to have time complexity for the Collection and Associated
Collections as well...

> Finally, thanks for reviving Edison. I had always been disappointed by
> its obscurity.

Indeed; it has always seemed to me like a great project that never got the
attention it deserved.


Thanks for your comments!
Rob Dockins
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

sethk
I just noticed this email thread.  I've wanted to use some things in Edison for a while so I'm glad to see that it is still alive.

As far as the fold variations go, I can't see that it can possibly hurt anything to include more variations, especially if the work is already done.  I would like to see the documentation constructed so that the ones that are most likely to be "correct" (or desirable) are listed first.  That way I don't have to figure out exactly what the issues are with them in order to grab the "right" one.  Especially since I usually look at the more subtle performance issues after I've built a working program.

Seth


On Fri, 3 Mar 2006 21:34:39 -0500
Robert Dockins <[hidden email]> wrote:

> On Friday 03 March 2006 08:28 pm, you wrote:
> > Robert Dockins writes:
> > > The second release candidate for Edison 1.2 is now ready for
> > > comments.  The notable changes from RC1 are:
> > >
> > >    * add strict variants of all folds and reduces
> >
> > You know, I was looking through the code last night and thought, "what
> > about foldl'?" Funny.
> >
> > That being said, eight variants of fold* (plus six variants of reduce*)
> > feels excessive to me. I can understand the desire for generality, but
> > I'm not sure it's possible to decide between, say, foldr and foldr'
> > without knowing the sequence type.
>
> Well, that's certainly an issue I considered when adding them to the API.
>
> > As I understand it, with regular lists you almost always[*] want either
> > foldr or foldl', because foldl builds up a chain of unevaluated thunks
> > and foldr' has linear call stack growth. (When I was learning ML, we
> > were told to always prefer foldl over foldr for that reason.)
> >
> > With reverse lists (i.e., Rev []), the situation is reversed: foldr' is
> > tail-recursive, foldl is fully lazy, and foldr and foldl' have linear
> > growth.
>
> I toyed for a good while with only defining some of the strict folds, or doing
> some other halfway measure.  The thing that eventually made be decide to
> throw in the whole kitchen sink (all the strict folds) was the realization
> that the API is specified at a purely semantic level; at that level, the only
> thing that matters about strict folds is that they have exactly the same
> semantics as the non-strict fold given a combining function which is strict
> in the correct argument(s) (otherwise they are guaranteed to be the same
> except that the strict fold may diverge in cases where the non-strict fold
> does not).  When it comes time to implement, the strict folds give you more
> leway to force closures early BUT YOU DON'T HAVE TO!  For the regular list
> example, I could define foldr' = foldr and it would be correct according to
> the specification.  For now I have defined all the implementations to be as
> strict as possible in all cases, but I can make them lazier later if it turns
> out to have better space behavior; as you mention, foldr' on regular lists is
> problematic and it will probably be the first thing to be reverted to a lazy
> implementation.
>
> > For the other sequences, there may not be clear winners. With
> > SimpleQueue, for example, the choice between foldr and foldr' might
> > depend on how the queue is arranged.
> >
> > Maybe it would be better to have each implementation default to the
> > strictness that's most likely best, at least in the generic Seq
> > interface. That way, [] would get strict foldl and non-strict foldr, but
> > Rev [] would get strict foldr and non-strict foldl. Individual modules
> > could provide extra variants as desired outside the class interface.
>
> Both the collection and the associated collection have fold (and fold') which
> folds over elements in an unspecified order.  Perhaps I could add that to the
> sequence API to give the "best behavior" choice to the sequence implementer.  
> Then for lists you would have (lazy) fold = foldr and (strict) fold' =
> foldl'.
>
> > [*] I guess you might want a non-strict foldl with lists if the foldee
> > is expensive enough that building up a pile of thunks is worth it if
> > they can potentially be skipped.
>
> > > -- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set
> > > has operations with the
> > >     same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am
> > > considering adoping the
> > >     Data.Set names, but they are quite a bit longer.  Also what about
> > > the meaning of "subset"
> > >     vs "proper subset"?  What do you find most intuitive?
> >
> > I would go with subset and properSubset. The "isSubsetOf" form only
> > makes sense if used in-line, but we have (|=) for that.
>
> I'm leaning in this direction as well; I think I've pretty well decided the
> names as they stand now have to go (they contradict standard usage), and I
> don't really like the Data.Map names.
>
> > > In addition, I'm interested in any API related feedback you might
> > > have.
> >
> > Is there a reason why the TernaryTrie and the PatriciaLoMap don't
> > implement OrdAssoc?
>
> Not really.  When I inherited Edison, none of the associated collections
> implemented the Ord* classes (I don't know why).  I added AssocList and
> StandardMap because they were easy.  The others are significantly more
> complicated and I thought it was more important to work out the API issues
> and kick 1.2 out the door before tacking this (same reason the 'strict' folds
> for TernaryTrie are really just the non-strict folds).
>
> > For the Seq class, I'd rather have the O(n) stuff listed with the
> > implementations, rather than having a default in the interface.
>
> Noted.  Its a dreadfully tedious cut-n-paste job which is why I haven't done
> it; patches and clever scripts to do this are welcome ;-)  Of course, it
> would be nice to have time complexity for the Collection and Associated
> Collections as well...
>
> > Finally, thanks for reviving Edison. I had always been disappointed by
> > its obscurity.
>
> Indeed; it has always seemed to me like a great project that never got the
> attention it deserved.
>
>
> Thanks for your comments!
> Rob Dockins
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Christian Maeder
In reply to this post by Bugzilla from robdockins@fastmail.fm
Robert Dockins wrote:
> The second release candidate for Edison 1.2 is now ready for  comments.  

Will this make Jean-Philippe's work obsolete?

http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework

Being (currently) no enthusiast anyway, I would not even know which
classes to choose now, since Edison has been improved.

> -- What do you think about the symbolic operators?

I prefer alphanumeric identifiers.

> In addition, I'm interested in any API related feedback you might have.

Data.Edison.Assoc.fold

"Combine all the elements in the associative collection, given a
combining function and an initial value. The elements are processed in
an unspecified order. Note that fold ignores the keys."

This fold function looks pretty unsafe to me, because (the common) "fold
(:) []" might reveal implementation details that one wants to hide. Such
a function should be called "structuralFold" (or something like that).

Cheers Christian
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Jean-Philippe Bernardy
On 3/6/06, Christian Maeder <[hidden email]> wrote:
> Robert Dockins wrote:
> > The second release candidate for Edison 1.2 is now ready for  comments.
>
> Will this make Jean-Philippe's work obsolete?
> http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework

I intend to finish and publish that work... and of course make it the
superior choice. :) In time I hope to retrofit edison concrete modules
into my framework.

Cheers,
JP.
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm
In reply to this post by Christian Maeder

On Mar 6, 2006, at 8:27 AM, Christian Maeder wrote:

> Robert Dockins wrote:
>> The second release candidate for Edison 1.2 is now ready for  
>> comments.
>
> Will this make Jean-Philippe's work obsolete?
>
> http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework
>
> Being (currently) no enthusiast anyway, I would not even know which  
> classes to choose now, since Edison has been improved.


I personally have no particular stance on this.  Jean-Philippe's  
design is quite different from Edison's and in the absence of focus  
groups and user studies we don't have any real information about what  
design appeals best to programmers.  I suspect that, in the end, only  
one type-class data structure abstraction will gain widespread use,  
but I'm not going to make any predictions at this point.  Obviously,  
standardization decisions can have a strong impact on this process.

As for myself, I'll probably be using Edison, because I'm pretty  
familiar with it ;-)


>> -- What do you think about the symbolic operators?
>
> I prefer alphanumeric identifiers.


So do I, as it happens :-)  Which is why the symbols are only aliases.


>> In addition, I'm interested in any API related feedback you might  
>> have.
>
> Data.Edison.Assoc.fold
>
> "Combine all the elements in the associative collection, given a  
> combining function and an initial value. The elements are processed  
> in an unspecified order. Note that fold ignores the keys."
>
> This fold function looks pretty unsafe to me, because (the common)  
> "fold (:) []" might reveal implementation details that one wants to  
> hide. Such a function should be called "structuralFold" (or  
> something like that).


I'm not sure I understand; why do you think that revealing structure  
is unsafe?  It's not like you can subvert the program with this  
additional knowledge.  Obviously, if you rely on the elements being  
provided in some particular implementation-dependent order than your  
code is broken (perhaps this should be more explicit in the docs?).

I'm hesitant to give "fold" a longer name because I think it should  
be the first fold a programmer reaches for.  In the fairly common  
case where you have a commutative, associative function (eg (+)), you  
don't care in what order the function is applied to the elements, and  
you have fold f z === foldl f z === foldr f z.  Why should you prefer  
"fold"?  Because it gives data structure implementers the freedom to  
define a fold that is more efficient than foldr or foldl or whatever  
else (usually, as you surmised, by following the internal structure).

In light of this and previous discussions about folds, I think I will  
be adding a section to the docs about how to chose an appropriate  
fold for your application.


> Cheers Christian

Thanks for your comments!


Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Christian Maeder
Robert Dockins wrote:
> I'm not sure I understand; why do you think that revealing structure  is
> unsafe?  

I don't want that i.e. equal maps yield different results, just because
they are internally differently balanced.

> I'm hesitant to give "fold" a longer name because I think it should  be
> the first fold a programmer reaches for.

I don't aggree. I immediately made it wrong by my own "toList = fold (:)
[]" and did not even notice foldr or foldl. The fold for efficiency
should be mainly used by library writers. But users I tell: correctness
first, efficiency (when needed) later!

> In the fairly common  case
> where you have a commutative, associative function (eg (+)), you  don't
> care in what order the function is applied to the elements

These properties are not garuanteed and difficult to track down in the
few (but severe) cases it's made wrong.

> , and  you
> have fold f z === foldl f z === foldr f z.

I see, foldr and foldl are your safe versions! (I think Jean-Philippe
only has a safe "fold" and misses an unsafe one.)

> In light of this and previous discussions about folds, I think I will  
> be adding a section to the docs about how to chose an appropriate  fold
> for your application.

At least that plus a warning!

Cheers Christian
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm

On Mar 6, 2006, at 10:56 AM, Christian Maeder wrote:

> Robert Dockins wrote:
>> I'm not sure I understand; why do you think that revealing  
>> structure  is unsafe?
>
> I don't want that i.e. equal maps yield different results, just  
> because they are internally differently balanced.

Well, this obviously depends on what you mean by "equal".  That can  
get slippery pretty fast.  I assume from your comments that you mean  
extensional equality (ie, viewing the map as a partial function with  
a known finite domain).

>> I'm hesitant to give "fold" a longer name because I think it  
>> should  be the first fold a programmer reaches for.
>
> I don't aggree. I immediately made it wrong by my own "toList =  
> fold (:) []" and did not even notice foldr or foldl. The fold for  
> efficiency should be mainly used by library writers. But users I  
> tell: correctness first, efficiency (when needed) later!

I tend to agree with that philosophy.  On the other hand, if you are  
bothering to use a dedicated data structure library, its probably  
because you are at least somewhat cognizant of performance issues;  
otherwise you'd just use lists and tuples for everything (a  
phenomenon one one can actually observe in a lot of projects).

>> In the fairly common  case where you have a commutative,  
>> associative function (eg (+)), you  don't care in what order the  
>> function is applied to the elements
>
> These properties are not garuanteed and difficult to track down in  
> the few (but severe) cases it's made wrong.

These properties _can't_ be guaranteed (at least not by Haskell  
compilers).  It is true that you can shoot yourself in the foot with  
the Edison API (this is one of the stronger criticisms one can level  
at it, IMO).  Some operations even allow you to violate internal data  
structure consistency if used incorrectly.  Edison has already made  
the design choice to present the programmer with abuse-able  
operations.  I think an arbitrary-order fold is pretty tame (on the  
other hand, it is a significantly more "visible" operation than the  
actually unsafe ones).

Humm... I wonder if there is a case for extracting a subset of the  
API which only presents "safe" (ie "difficult/impossible to abuse")  
operations?  Perhaps with a simplified class hierarchy as well....    
What do you think?  That way you could start by using the simplified  
API and move to the full API when/if you discover opportunities for  
performance enhancements from the less-safe or more esoteric operations.


>> , and  you have fold f z === foldl f z === foldr f z.
>
> I see, foldr and foldl are your safe versions! (I think Jean-
> Philippe only has a safe "fold" and misses an unsafe one.)

Well, its actually more complicated than safe vs unsafe.  The base  
"AssocX" class (wherein fold appears) doesn't assume an ordering  
relation on keys (no "Ord k" context). What we do in the absence of a  
total ordering on keys is to present the elements in an "arbitrary"  
order.  Because the order is "arbitrary" we can chose it to be  
advantageous (ie, follow the internal structure), but that's really  
just a side benefit of the relaxed semantics.  The subclass  
"OrdAssocX" further assumes an ordering on keys, so it can provide  
foldr and friends (because now key ordering is defined).  However,  
even with foldr, foldl, etc, there is a degree undefinedness if the  
data structure is a finite relation rather than a finite map (in what  
order should you present elements bound to equal keys?).  The folds  
for bags have similar properties.

>> In light of this and previous discussions about folds, I think I  
>> will  be adding a section to the docs about how to chose an  
>> appropriate  fold for your application.
>
> At least that plus a warning!

Indeed.  Its seems folds are sufficiently complicated to warrant some  
special attention in the docs.  In addition, I suppose I should track  
down each and every operations which can be observed to behave  
differently based on hidden state and mark them as "non-deteministic"  
or some such.  What is a good term for an operation whose result is  
only partially defined?  I really want to reserve the term "unsafe"  
for operations that can violate internal consistency.


Thanks again!  This is exactly the kind of discussion that helps  
produce good APIs.

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Christian Maeder
Robert Dockins wrote:
> Well, this obviously depends on what you mean by "equal".  That can  get
> slippery pretty fast.  I assume from your comments that you mean  
> extensional equality (ie, viewing the map as a partial function with  a
> known finite domain).

Right, this corresponds to the Eq instance for the current Data.Map
(with a safe fold) as an _abstract_ data type.

I can hardly imagine a (finite) map implementation without Ord instance
for the keys. With equality only you have no chance to yield the keys of
equal maps in a fixed order (by a toList function). Thus an association
list is not a (mathematical) map (and your fold function is fine for
such lists).

> These properties _can't_ be guaranteed (at least not by Haskell  
> compilers).

I know, also the total-order-property of Ord instances can not be
guaranteed (but "deriving" helps, of course).

> However,  even with
> foldr, foldl, etc, there is a degree undefinedness if the  data
> structure is a finite relation rather than a finite map (in what  order
> should you present elements bound to equal keys?).  The folds  for bags
> have similar properties.

Relations are (conceptually) sets of pairs and bags are multisets (or
maps from elements to their occurrence counts). In both cases I'd expect
to see an ascending ordered list when shown.

Also for Tupels there are several possibilities for total orderings, but
the type class Ord forces you to pick one.

Christian
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm

On Mar 6, 2006, at 1:33 PM, Christian Maeder wrote:

> Robert Dockins wrote:
>> Well, this obviously depends on what you mean by "equal".  That  
>> can  get slippery pretty fast.  I assume from your comments that  
>> you mean  extensional equality (ie, viewing the map as a partial  
>> function with  a known finite domain).
>
> Right, this corresponds to the Eq instance for the current Data.Map  
> (with a safe fold) as an _abstract_ data type.
>
> I can hardly imagine a (finite) map implementation without Ord  
> instance for the keys.

http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-Mem- 
StableName.html#t%3AStableName

No Ord instance.  Would be very useful to be able to put it in a map;  
it hashes, so its not a total waste to do it.  I recall seeing  
conversations on some haskell list in the past about how people  
wanted to put stable names in Data.Map but couldn't because of the  
need for an Ord instance.

Really any data type which isn't totally ordered but can be hashed  
could be used in a map this way.  (Not implemented in Edison  
currently, but it should be).

> With equality only you have no chance to yield the keys of equal  
> maps in a fixed order (by a toList function).

Right.

> Thus an association list is not a (mathematical) map (and your fold  
> function is fine for such lists).

I don't think I agree here.  What definition are you using for map?  
Maps don't require a total ordering on their domain.  http://
mathworld.wolfram.com/Map.html

Or are you getting at something else?

>> These properties _can't_ be guaranteed (at least not by Haskell  
>> compilers).
>
> I know, also the total-order-property of Ord instances can not be  
> guaranteed (but "deriving" helps, of course).
>
>> However,  even with foldr, foldl, etc, there is a degree  
>> undefinedness if the  data structure is a finite relation rather  
>> than a finite map (in what  order should you present elements  
>> bound to equal keys?).  The folds  for bags have similar properties.
>
> Relations are (conceptually) sets of pairs and bags are multisets  
> (or maps from elements to their occurrence counts). In both cases  
> I'd expect to see an ascending ordered list when shown.

You'd need a total ordering on the elements for that -- Edison  
doesn't assume anything about the elements of an associated  
collection (and neither does Data.Map).  What if you wanted a  
relation/map between, say, integers and functions :: String -> Bool  
or between Strings and IO actions or something like that?

Re: bags as maps to counts; this can be problematic, for reasons  
discussed here (http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data- 
Edison-Coll.html)  See the section titled "Notes on Observability".  
If you assume a world where everything has an Eq instance  
representing decidable leibniz equality and Ord instances  
representing total orderings, a lot of things become easier.  
Unfortunately, we don't live in that world, which is why we have to  
worry about all this other stuff.  The only thing we assume is that  
an Eq instance defines some equivalence relation and that an Ord  
instance (if it exists) defines a total ordering (and if those  
assumptions are violated, all is lost and we give up).

> Also for Tupels there are several possibilities for total  
> orderings, but the type class Ord forces you to pick one.

This is true, but orthogonal.  I actually considered adding an API  
where you could specify orderings with arbitrary functions :: a -> a -
 > Ordering, but I decided it would be too hard to use (and a pain to  
write besides).


Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Adrian Hey
In reply to this post by Jean-Philippe Bernardy
On Monday 06 Mar 2006 2:04 pm, Jean-Philippe Bernardy wrote:

> On 3/6/06, Christian Maeder <[hidden email]> wrote:
> > Robert Dockins wrote:
> > > The second release candidate for Edison 1.2 is now ready for  comments.
> >
> > Will this make Jean-Philippe's work obsolete?
> > http://hackage.haskell.org/trac/ghc/wiki/CollectionClassFramework
>
> I intend to finish and publish that work... and of course make it the
> superior choice. :) In time I hope to retrofit edison concrete modules
> into my framework.

Hello Jean-Philippe,

I haven't had much time to study this, but in the light of recent
discussions about generalised tries I played about a little last night
and did have a go at producing n-Tuple maps from instances of your classes.
e.g. for pairs I ended up trying to define something like this..

instance (MapLike c1 k1 c2, MapLike c2 k2 a) => MapLike (Map2 c1) (k1,k2) a
 where..

I must admit I found it hard going, but maybe I'm just using it wrong and
I didn't have much time to play about with alternatives. Perhaps you'd care
to give it a shot.

I'm not sure about the proper use of fundeps, but having the associated
values dependent on the map type seemed a little awkward to me and also
seems to make it hard to properly define intersectionWith for maps with
the same key type but different associated value types (one of the
infelicities you mention).

Would something like this be possible (using "Map" instead of "MapLike")?

class Ord k => Map map k | map -> k where

(I think I'd like an Ord constraint on keys)
then given..

newtype (Map2 map1 map2           a) = Map2 (     map1 (map2             a))
newtype (Map3 map1 map2 map3      a) = Map3 (Map2 map1  map2 (map3       a))
newtype (Map4 map1 map2 map3 map4 a) = Map4 (Map3 map1  map2  map3 (map4 a))

or perhaps..

newtype (Map2 map1 map2           a) = Map2 (map1 (     map2           a))
newtype (Map3 map1 map2 map3      a) = Map3 (map1 (Map2 map2 map3      a))
newtype (Map4 map1 map2 map3 map4 a) = Map4 (map1 (Map3 map2 map3 map4 a))

we could define..

instance (Map map1 k1, Map map2 k2)
      => Map (Map2 map1 map2) (k1,k2) where

instance (Map map1 k1, Map map2 k2, Map map3 k3)
      => Map (Map3 map1 map2 map3) (k1,k2,k3) where

instance (Map map1 k1, Map map2 k2, Map map3 k3, Map map4 k4)
      => Map (Map4 map1 map2 map3 map4) (k1,k2,k3,k4) where

Implementing these instances efficiently would be a good test of the
completeness of the class methods too (I haven't tried it yet :-).

But maybe there's a better way. Anybody got any alternative suggestions?

Regards
--
Adrian Hey


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Christian Maeder
In reply to this post by Bugzilla from robdockins@fastmail.fm
Robert Dockins wrote:
>
>> Thus an association list is not a (mathematical) map (and your fold  
>> function is fine for such lists).
>
> I don't think I agree here.  What definition are you using for map?  
> Maps don't require a total ordering on their domain.  http://
> mathworld.wolfram.com/Map.html

Well, "A map is a way of associating unique objects to every element in
a given _set_"! The (strong or given) equality of [(a,b)] does not
correspond to the (possibly uncomputable) equality of maps (where order
does not matter). So association lists are no maps unless you hide (or
ignore) all functions that allow you to observe differences (i.e, your
fold and elements function).

Also for hash maps you have no chance to order elements with equal hash
keys. Maybe all computable and observable maps (and sets) require a
total order?

Of course all mentioned data structures are useful for certain purposes
(and it's maybe only a matter of documentation).

Cheers Christian
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Edison 1.2rc2

Bugzilla from robdockins@fastmail.fm

I confess I've become a bit lost.  I'm not longer sure I understand  
the points you're making; I'll make my best effort to answer, but if  
I reply to points other than the ones you've made, please forgive me.

On Mar 7, 2006, at 6:51 AM, Christian Maeder wrote:

> Robert Dockins wrote:
>>> Thus an association list is not a (mathematical) map (and your  
>>> fold  function is fine for such lists).
>> I don't think I agree here.  What definition are you using for  
>> map?   Maps don't require a total ordering on their domain.  
>> http:// mathworld.wolfram.com/Map.html
>
> Well, "A map is a way of associating unique objects to every  
> element in a given _set_"!

Clarification: "finite set" (for our purposes).  I believe you are  
highlighting that sets have no intrinsic notion of order?

> The (strong or given) equality of [(a,b)] does not correspond to  
> the (possibly uncomputable) equality of maps (where order does not  
> matter).

What I think you are saying:

Let al :: [(a,b)] be a finite association list, where type 'a' has a  
decidable equality relation, and if (x,y) and (w,z) both appear in al  
such that x == w (equality), then y === z (identity, not necessarily  
decidable).

Then let toMap1 and toMap2 be two different implementations of  
coercion from association lists to maps and:

m1 = toMap1 al
m2 = toMap2 al

We must have that m1 is extensional equal to m2.

However, if we have fromMap1 and fromMap2 that extract an association  
list from the different maps we might have:

fromMap1 m1 <>  fromMap2 (where tuple equality is defined using ==  
for a and === for b)

The problem is that there is an entire equivalence class of  
association lists each of which represents the map in question; each  
"fromMap*" has to make a particular concrete choice from this  
equivalence class, but they might make different choices.

I think I have decided to call such operations "ambiguous" because  
they (conceptually) generate a set of possible results and  
arbitrarily choose one.  Clearly, such behavior might be confusing to  
a programmer, so it should be carefully marked.  The "safe"  
operations you prefer collapse the ambiguity as part of the  
function's contract by, eg, specifying that keys appear in order.  I  
also suspect this is the primary source of our disagreement; I'm  
interested in allowing programmers access to ambiguous operations,  
and you think that the exposed API should only consist of well-
defined (non-ambiguous) operations.  Is that a fair characterization?

> So association lists are no maps unless you hide (or ignore) all  
> functions that allow you to observe differences (i.e, your fold and  
> elements function).

If I were being particularly pedantic, I would instead say that  
association lists are not maps (at all).  Rather, they represent and  
can be transformed into (finite) maps, in much the same way that, eg,  
source code represents and can be transformed into a particular  
abstract syntax tree.  Parsing and pretty printing a source code file  
will not (in general) result in identical concrete syntax.  Would you  
therefore say that concrete source code is not a program?  You might  
(if you were being particularly pedantic), but I think most people  
would simply equate source code "up to parsing", at least when  
talking about the "program".  I think we have the same situation  
here, were association lists can be informally identified with the  
maps they represent, but have to be thought about up to reordering.

> Also for hash maps you have no chance to order elements with equal  
> hash keys. Maybe all computable and observable maps (and sets)  
> require a total order

What do you mean by computable in this context?

> Of course all mentioned data structures are useful for certain  
> purposes (and it's maybe only a matter of documentation).
>
> Cheers Christian



Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries