Proposal #2717: Add nubWith, nubOrd

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal #2717: Add nubWith, nubOrd

Barton C Massey
Ok, I've killed the overly ambitious proposal #2629, and replaced
it with the more modest proposal #2717
  http://hackage.haskell.org/trac/ghc/ticket/2717
which just adds Data.List.nubWith and Data.Set.nubOrd as previously
discussed ad nauseam.

I know at least one person thought that adding these was a bad idea, but some
others thought it was a fine thing to do.  Hence the discussion on this list.

    Bart Massey
    [hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

RE: Proposal #2717: Add nubWith, nubOrd

Mitchell, Neil
Hi,

nubOrd: Seems good. Useful function to have. Shame its not in Data.List,
but I understand the reasons for that, and think this is a perfectly
sensible choice.

nubWith: Seems bad. A not particularly useful function (other than to
write nubOrd), with a fairly confusing name. I have previously used
nubWith to mean nubOn (nubOn f = nubBy ((==) `on` f)) with cacheing of
the function f. As a name, "with" only means "with additional stuff" -
it gives no hint what the additional stuff is. The concept of stop-lists
is a little confusing, and then you tell the user they almost certainly
want to pass [] every single time - in that case why do they get an
option? Also stop-lists have type "b", so
stop-unconstrainted-type-variable seems a more appropriate name :-) The
type signature is fairly complex.

I'm not even convinced that nubWith really is a nub function, and not
just some generalised filter - filterState perhaps. In which case you'd
want to generalise Maybe b to (Bool,b). Can you ever imagine anyone
other than nubOrd using nubWith? Is it a genuine utility function people
have been crying out for? It seems perfectly good to include as a local
function in Data.Set to be used to implement nubOrd, but I don't see its
value as a standalone export from a very commonly used library full of
very useful stuff.

So, in summary I think nubOrd is great, and nubWith isn't nub, and isn't
great.

Thanks

Neil
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Bart Massey
> Sent: 21 October 2008 10:04 am
> To: [hidden email]
> Subject: Proposal #2717: Add nubWith, nubOrd
>
> Ok, I've killed the overly ambitious proposal #2629, and
> replaced it with the more modest proposal #2717
>   http://hackage.haskell.org/trac/ghc/ticket/2717
> which just adds Data.List.nubWith and Data.Set.nubOrd as
> previously discussed ad nauseam.
>
> I know at least one person thought that adding these was a
> bad idea, but some others thought it was a fine thing to do.  
> Hence the discussion on this list.
>
>     Bart Massey
>     [hidden email]
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>
>

==============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal #2717: Add nubWith, nubOrd

David Roundy-5
On Tue, Oct 21, 2008 at 5:16 AM, Mitchell, Neil
<[hidden email]> wrote:
> So, in summary I think nubOrd is great, and nubWith isn't nub, and isn't
> great.

Good  points.

David
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal #2717: Add nubWith, nubOrd

Barton C Massey
In reply to this post by Mitchell, Neil
Mitchell, Neil <neil.mitchell.2 <at> credit-suisse.com> writes:
> nubOrd: Seems good. Useful function to have. Shame its not in Data.List,
> but I understand the reasons for that, and think this is a perfectly
> sensible choice.

Thanks.

> nubWith: Seems bad. A not particularly useful function (other than to
> write nubOrd), with a fairly confusing name. I have previously used
> nubWith to mean nubOn (nubOn f = nubBy ((==) `on` f)) with cacheing of
> the function f. As a name, "with" only means "with additional stuff" -
> it gives no hint what the additional stuff is.

Were we to keep it, do you have a better naming suggestion?  I agree that
nubWith is not optimal, but I don't have a great alternative.  As you sort
of suggest below, maybe nubFilter could work?  [The problem starts with
the belief of everyone I've ever talked to about it that "nub" itself
should have been called "uniq". :-)]

> The concept of stop-lists is a little confusing, and then you tell
> the user they almost certainly want to pass [] every single time - in that
> case why do they get an option?

One can imagine situations in which you want to accumulate a "stop list"
over several runs of the function, but of course nubWith as designed won't
let you do that, so that's fail.

If they already have a list of things they want pre-stopped, they could
pass that in.  Imagine a variant of nondescendingSubsequence below in
which the accumulator started with 0 instead of the first element, for
example.

Of course the technical reason the user needs to pass an empty stop list is that
we killed the type class that goes with nubWith, so there's no way to know what
the empty accumulator of their stop list type actually is.

> Also stop-lists have type "b", so stop-unconstrainted-type-variable
> seems a more appropriate name.  

The name "stop list" is a term of art in the algorithm and AI fields, which
is why I chose it.  I could stick some bib reference in the docs if
you thought that would help.  In general, the stop list is an accumulator
of values that you want to stop after the first one.

> The type signature is fairly complex.

I agree.  I'd written this off initially for this reason, but other
folks on the list seem to be fine with it.

> I'm not even convinced that nubWith really is a nub function, and not
> just some generalised filter - filterState perhaps. In which case you'd
> want to generalise Maybe b to (Bool,b).

Or to Either b b?  Which is preferred in this case?

> Can you ever imagine anyone other than nubOrd using nubWith?

Yes. As discussed previously you can get a significant constant factor
speedup with nubInt on IntSets if you need it.  Also, as discussed previously,
you can get a speedup for nubChar by using a mutable array as the stop list.

One can also imagine using this in less traditional ways; for example

  nondescendingSubsequence [] = []
  nondescendingSubsequence (e : es) =
    e : nubWith (\a b-> if (a >= b) then Just a else Nothing) e es

  > nondescendingSubsequence [1,2,3,2,3,3,4,1,1]
  [1,2,3,3,3,4]

> Is it a genuine utility function people
> have been crying out for?

IMHO it is a "genuine utility function", whatever that means.
It certainly isn't something "people have been crying out for",
so if that's the criterion we should omit it.

It just seems churlish to hide it, given that it's sitting in
there being a perfectly useful building block for things people
do cry out for.  I'd guess it would be used about as often
as "break", which is in there for much the same reason.

> It seems perfectly good to include as a local
> function in Data.Set to be used to implement nubOrd, but I don't see its
> value as a standalone export from a very commonly used library full of
> very useful stuff.

We could move it to live in Data.Set with nubOrd if folks think that it
doesn't belong in Data.List.  Or we could move them both into their own module,
although that seems silly to me given that I can't really think what else
goes in there.

> So, in summary I think nubOrd is great, and nubWith isn't nub, and isn't
> great.

Thanks hugely for the detailed commentary!

Please understand that I'm in no way wedded to the idea of getting any of this
in; I want to do what works best for everyone, and am grateful for your and
everyone's help in figuring out what that is.

    Bart Massey
    [hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

RE: Proposal #2717: Add nubWith, nubOrd

Mitchell, Neil
Hi

> Were we to keep it, do you have a better naming suggestion?

[Warning: untested code]

filterAccum :: (acc -> x -> (acc,Bool)) -> acc -> [x] -> (acc, [x])
filterAccum f a [] = (a, [])
filterAccum f a (x:xs) = (an, [x|b]++rest)
   where (a2,b) = f a x
         (an,rest) = filterAccum f a2 xs

This follows the type of mapAccumL, and is more general than your
function. You could change the utility function to be (acc -> x ->
(acc,Maybe y)) to get a variant that is more general than both mapAccum
and filterAccum.


> could work?  [The problem starts with the belief of everyone
> I've ever talked to about it that "nub" itself should have
> been called "uniq". :-)]

Your function has nothing to do with uniqueness or nubness. It is
filtering with a state.

> Of course the technical reason the user needs to pass an
> empty stop list is that we killed the type class that goes
> with nubWith, so there's no way to know what the empty
> accumulator of their stop list type actually is.

This all seems like complexity that shouldn't be there. The base library
should provide simple things (folds, maps, filters) and simple concepts
with very slightly more involved implementations (sort, reverse, nub).
Anything that isn't a simple concept should go in a separate library.

> general, the stop list is an accumulator of values that you
> want to stop after the first one.

That's how you use it from nubOrd, not anything to do with the function.

> > I'm not even convinced that nubWith really is a nub
> function, and not
> > just some generalised filter - filterState perhaps. In which case
> > you'd want to generalise Maybe b to (Bool,b).
>
> Or to Either b b?  Which is preferred in this case?

Either b b is a horrible type, its semantically equivalent to (Bool,b)
but with added confusion (in most cases, there are some exceptions). See
my above filterState for a more general version.


> > Can you ever imagine anyone other than nubOrd using nubWith?
>
> Yes. As discussed previously you can get a significant
> constant factor speedup with nubInt on IntSets if you need
> it.  Also, as discussed previously, you can get a speedup for
> nubChar by using a mutable array as the stop list.

Sounds like great reasons for adding nubInt (or just nubOrd in the
IntMap module). Perhaps there should also be a CharMap module which
provides similar values for Char. I can't imagine your nubChar with a
mutable array handles all Unicode points?

These ideas are starting to be more radical, and sound like the thing a
type class or type families would be suited for. Perhaps prototype these
ideas in a "fastnub" library on Hackage?

> One can also imagine using this in less traditional ways; for example
>
>   nondescendingSubsequence [] = []
>   nondescendingSubsequence (e : es) =
>     e : nubWith (\a b-> if (a >= b) then Just a else Nothing) e es
>
>   > nondescendingSubsequence [1,2,3,2,3,3,4,1,1]
>   [1,2,3,3,3,4]
>
> > Is it a genuine utility function people have been crying out for?
>
> IMHO it is a "genuine utility function", whatever that means.
> It certainly isn't something "people have been crying out
> for", so if that's the criterion we should omit it.

I think for the base library that "crying out for" is the minimum
criteria. We also need it to have an obvious name, a well explored
design space (or an obviously trivial design space), and the desire to
support it for all eternity.

> It just seems churlish to hide it, given that it's sitting in
> there being a perfectly useful building block for things
> people do cry out for.  I'd guess it would be used about as
> often as "break", which is in there for much the same reason.

Hoogle uses break 33 times. It used to use it a lot more, but then I
moved to using Parsec. I've never had a desire to use filterState or
nubWith. I'm only one person, so this may not be typical.

>
> > It seems perfectly good to include as a local function in
> Data.Set to
> > be used to implement nubOrd, but I don't see its value as a
> standalone
> > export from a very commonly used library full of very useful stuff.
>
> We could move it to live in Data.Set with nubOrd if folks
> think that it doesn't belong in Data.List.  Or we could move
> them both into their own module, although that seems silly to
> me given that I can't really think what else goes in there.

It's too little for its own module - if you are going to do that just
make it a separate package. I don't want nubWith moved to Data.Set, I
want it hidden. Once its hidden, then it can go wherever (although
because of package scoping rules etc inside Data.Set is the best place)

> Please understand that I'm in no way wedded to the idea of
> getting any of this in; I want to do what works best for
> everyone, and am grateful for your and everyone's help in
> figuring out what that is.

I realise. I really am very enthusiastic about getting nubOrd in (I've
even proposed it previously myself), and appreciate the effort you've
put into this.

Thanks

Neil

==============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal #2717: Add nubWith, nubOrd

Isaac Dupree
Mitchell, Neil wrote:

> Hi
>
>> Were we to keep it, do you have a better naming suggestion?
>
> [Warning: untested code]
>
> filterAccum :: (acc -> x -> (acc,Bool)) -> acc -> [x] -> (acc, [x])
> filterAccum f a [] = (a, [])
> filterAccum f a (x:xs) = (an, [x|b]++rest)
>    where (a2,b) = f a x
>          (an,rest) = filterAccum f a2 xs
>
> This follows the type of mapAccumL, and is more general than your
> function.

can we call it filterAccumL then?  It took me a while to
figure out from the code whether it was truly an "L" or "R"
version, and in principle there could be filterAccumR
(although I don't know why we'd want it)

> You could change the utility function to be (acc -> x ->
> (acc,Maybe y)) to get a variant that is more general than both mapAccum
> and filterAccum.

in which case the result type would be (acc, [y]), and it
would have a similar type and semantics to
Data.Maybe.mapMaybe :: (a -> Maybe b) -> [a] -> [b]

.. maybeAccumL? :-)

-Isaac
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

RE: Proposal #2717: Add nubWith, nubOrd

Mitchell, Neil
Hi

> >> Were we to keep it, do you have a better naming suggestion?
> >
> > [Warning: untested code]
> >
> > filterAccum :: (acc -> x -> (acc,Bool)) -> acc -> [x] -> (acc, [x])
> > filterAccum f a [] = (a, []) filterAccum f a (x:xs) = (an,
> > [x|b]++rest)
> >    where (a2,b) = f a x
> >          (an,rest) = filterAccum f a2 xs
> >
> > This follows the type of mapAccumL, and is more general than your
> > function.
>
> can we call it filterAccumL then?  It took me a while to
> figure out from the code whether it was truly an "L" or "R"
> version, and in principle there could be filterAccumR
> (although I don't know why we'd want it)

Yes, that would be the sensible name. However, I don't think it deserves
to be in the standard libraries :-)

If anyone feels it should be, that sounds like it should be a separate
proposal from nubOrd.

> > You could change the utility function to be (acc -> x -> (acc,Maybe
> > y)) to get a variant that is more general than both mapAccum and
> > filterAccum.
>
> in which case the result type would be (acc, [y]), and it
> would have a similar type and semantics to
> Data.Maybe.mapMaybe :: (a -> Maybe b) -> [a] -> [b]
>
> .. maybeAccumL? :-)

Yes, that does seem like the correct name for it. Again, I don't think
it's a massively useful function - the Accum functions just aren't as
useful anymore now Monads are better understood.

Thanks

Neil

==============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==============================================================================

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal #2717: Add nubWith, nubOrd

Heinrich Apfelmus
In reply to this post by Mitchell, Neil
Mitchell, Neil wrote:

> Hi
>
>> Were we to keep it, do you have a better naming suggestion?
>
> [Warning: untested code]
>
> filterAccum :: (acc -> x -> (acc,Bool)) -> acc -> [x] -> (acc, [x])
> filterAccum f a [] = (a, [])
> filterAccum f a (x:xs) = (an, [x|b]++rest)
>    where (a2,b) = f a x
>          (an,rest) = filterAccum f a2 xs
>
> This follows the type of mapAccumL, and is more general than your
> function. You could change the utility function to be (acc -> x ->
> (acc,Maybe y)) to get a variant that is more general than both mapAccum
> and filterAccum.

In other words,

   mapAccumL    = mapM
   filterAccumL = filterM

when used with the state monad.


Concerning the proposal, I agree with Neil, +1 for  nubOrd  and -1 for
nubWith  . Adding  filterAccum  sounds reasonable but should be a
separate proposal.


The patch introduces two documentation errors concerning the asymptotic
complexity:  m  is *not* the number of duplicate elements but the number
of *unique* elements, i.e.  n  minus the number of duplicates.
Furthermore, the proposed documentation for  nubOrd  should mention what
 nubOrd  actually does. I suggest changing it to:

  Like 'nub', the 'nubOrd' function removes duplicate elements
  from a list.

  But while 'nub' only requires that two elements
  can be tested for equality ('Eq a'), 'nubOrd' requires that
  the elements can be ordered ('Ord a'). This allows the 'nubOrd'
  implementation to be significantly faster; it runs in
  /O(n log m)/ time where /n/ is the length of the list and
  /m/ is the number of unique elements in that list.


Regards,
apfelmus

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal #2717: Add nubWith, nubOrd

Yitzchak Gale
In reply to this post by Barton C Massey
Bart Massey wrote:
>  http://hackage.haskell.org/trac/ghc/ticket/2717

Hmm, this reminds us that Data.Set really needs

insertMember :: Ord a => a -> Set a -> (Bool, Set a)

so that we don't need to traverse the tree twice each
time. And the same for Data.IntSet, of course.
Analogous to existing things like

Data.Set.deleteFindMin
Data.Map.insertLookupWithKey

Hasn't someone made this obvious proposal before?
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries