ordNub

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

ordNub

Niklas Hambüchen
tldr: nub is abnormally slow, we shouldn't use it, but we do.


As you might know, Data.List.nub is O(n²). (*)

As you might not know, almost *all* practical Haskell projects use it,
and that in places where an Ord instance is given, e.g. happy, Xmonad,
ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
more (see https://github.com/nh2/haskell-ordnub).

I've taken the Ord-based O(n * log n) implementation from yi using a Set:

  ordNub :: (Ord a) => [a] -> [a]
  ordNub l = go empty l
    where
      go _ []     = []
      go s (x:xs) = if x `member` s then go s xs
                                    else x : go (insert x s) xs


and put benchmarks on
http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
(compare `nub` vs `ordNub`).

`ordNub` is not only in a different complexity class, but even seems to
perform better than nub for very small numbers of actually different
list elements (that's the numbers before the benchmark names).

(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)

What do you think about ordNub?

I've seen a proposal from 5 years ago about adding a *sort*Nub function
started by Neil, but it just died.


(*) The mentioned complexity is for the (very common) worst case, in
which the number of different elements in the list grows with the list
(alias you don't have an N element list with always only 5 different
things inside).

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Clark Gaebel-2

Similarly, I've always used:

import qualified Data.HashSet as S

nub :: Hashable a => [a] -> [a]
nub = S.toList . S.fromList

And i can't think of any type which i can't write a Hashable instance, so this is extremely practical.

On Jul 14, 2013 7:24 AM, "Niklas Hambüchen" <[hidden email]> wrote:
tldr: nub is abnormally slow, we shouldn't use it, but we do.


As you might know, Data.List.nub is O(n²). (*)

As you might not know, almost *all* practical Haskell projects use it,
and that in places where an Ord instance is given, e.g. happy, Xmonad,
ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
more (see https://github.com/nh2/haskell-ordnub).

I've taken the Ord-based O(n * log n) implementation from yi using a Set:

  ordNub :: (Ord a) => [a] -> [a]
  ordNub l = go empty l
    where
      go _ []     = []
      go s (x:xs) = if x `member` s then go s xs
                                    else x : go (insert x s) xs


and put benchmarks on
http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
(compare `nub` vs `ordNub`).

`ordNub` is not only in a different complexity class, but even seems to
perform better than nub for very small numbers of actually different
list elements (that's the numbers before the benchmark names).

(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)

What do you think about ordNub?

I've seen a proposal from 5 years ago about adding a *sort*Nub function
started by Neil, but it just died.


(*) The mentioned complexity is for the (very common) worst case, in
which the number of different elements in the list grows with the list
(alias you don't have an N element list with always only 5 different
things inside).

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Niklas Hambüchen
One of my main points is:

Should we not add such a function (ord-based, same output as nub,
stable, no sorting) to base?

As the package counting shows, if we don't offer an alternative, people
obviously use it, and not to our benefit.

(Not to say it this way:
We could make the Haskell world fast with smarter fusion, strictness
analysis and LLVM backends.
Or we could stop using quadratic algorithms.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Francesco Mazzoli
In reply to this post by Clark Gaebel-2
At Sun, 14 Jul 2013 07:31:05 -0400,
Clark Gaebel wrote:
> Similarly, I've always used:
>
> import qualified Data.HashSet as S
>
> nub :: Hashable a => [a] -> [a]
> nub = S.toList . S.fromList
>
> And i can't think of any type which i can't write a Hashable instance, so
> this is extremely practical.

Well, the above is not stable while Niklas’ is.  But I guess that’s not
the point of your message :).

I’ve always avoided “nub” too, and FWIW I’d like a constrained version
too—maybe avoiding Data.Set so that it could live in Data.List.  I think
Ord would be much better than Hashable, since it is 1. in “base” 2. much
more established and understood.

Although if you find yourself using “nub” too much you’re probably doing
something wrong...

Francesco

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Clark Gaebel-2
In reply to this post by Niklas Hambüchen

Oops sorry I guess my point wasn't clear.

Why ord based when hashable is faster? Then there's no reason this has to be in base, it can just be a free function in Data.HashSet. If stability is a concern then there's a way to easily account for that using HashMap.

  - Clark

On Jul 14, 2013 7:48 AM, "Niklas Hambüchen" <[hidden email]> wrote:
One of my main points is:

Should we not add such a function (ord-based, same output as nub,
stable, no sorting) to base?

As the package counting shows, if we don't offer an alternative, people
obviously use it, and not to our benefit.

(Not to say it this way:
We could make the Haskell world fast with smarter fusion, strictness
analysis and LLVM backends.
Or we could stop using quadratic algorithms.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Roman Cheplyaka-2
In reply to this post by Niklas Hambüchen
Something like that should definitely be included in Data.List.
Thanks for working on it.

Roman

* Niklas Hambüchen <[hidden email]> [2013-07-14 19:20:52+0800]

> tldr: nub is abnormally slow, we shouldn't use it, but we do.
>
>
> As you might know, Data.List.nub is O(n²). (*)
>
> As you might not know, almost *all* practical Haskell projects use it,
> and that in places where an Ord instance is given, e.g. happy, Xmonad,
> ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
> more (see https://github.com/nh2/haskell-ordnub).
>
> I've taken the Ord-based O(n * log n) implementation from yi using a Set:
>
>   ordNub :: (Ord a) => [a] -> [a]
>   ordNub l = go empty l
>     where
>       go _ []     = []
>       go s (x:xs) = if x `member` s then go s xs
>                                     else x : go (insert x s) xs
>
>
> and put benchmarks on
> http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
> (compare `nub` vs `ordNub`).
>
> `ordNub` is not only in a different complexity class, but even seems to
> perform better than nub for very small numbers of actually different
> list elements (that's the numbers before the benchmark names).
>
> (The benchmark also shows some other potential problem: Using a state
> monad to keep the set instead of a function argument can be up to 20
> times slower. Should that happen?)
>
> What do you think about ordNub?
>
> I've seen a proposal from 5 years ago about adding a *sort*Nub function
> started by Neil, but it just died.
>
>
> (*) The mentioned complexity is for the (very common) worst case, in
> which the number of different elements in the list grows with the list
> (alias you don't have an N element list with always only 5 different
> things inside).
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Joey Adams
In reply to this post by Clark Gaebel-2
On Sun, Jul 14, 2013 at 7:31 AM, Clark Gaebel <[hidden email]> wrote:

Similarly, I've always used:

import qualified Data.HashSet as S

nub :: Hashable a => [a] -> [a]
nub = S.toList . S.fromList

And i can't think of any type which i can't write a Hashable instance, so this is extremely practical.

This won't yield results lazily (e.g. nub (repeat 'x') = _|_ instead of 'x' : _|_), but Niklas' ordNub will.  His ordNub can be translated directly to HashSet and still have the stability and laziness properties.

A difficulty with putting ordNub in Data.List is that it depends on containers, which is outside of the base package.  Some options:

 * Move the implementation of Set to base.

 * Implement a lean version of Set in base that only provides 'insert' and 'member'.

 * Define ordNub in Data.Set instead.

Adding a Hashable-based nub to base would be even more problematic, since you'd need Hashable in base.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Conrad Parker
On 15 July 2013 09:54, Joey Adams <[hidden email]> wrote:

> On Sun, Jul 14, 2013 at 7:31 AM, Clark Gaebel <[hidden email]> wrote:
>>
>> Similarly, I've always used:
>>
>> import qualified Data.HashSet as S
>>
>> nub :: Hashable a => [a] -> [a]
>> nub = S.toList . S.fromList
>>
>> And i can't think of any type which i can't write a Hashable instance, so
>> this is extremely practical.
>
> This won't yield results lazily (e.g. nub (repeat 'x') = _|_ instead of 'x'
> : _|_), but Niklas' ordNub will.  His ordNub can be translated directly to
> HashSet and still have the stability and laziness properties.
>
> A difficulty with putting ordNub in Data.List is that it depends on
> containers, which is outside of the base package.  Some options:
>
>  * Move the implementation of Set to base.
>
>  * Implement a lean version of Set in base that only provides 'insert' and
> 'member'.
>
>  * Define ordNub in Data.Set instead.
>
> Adding a Hashable-based nub to base would be even more problematic, since
> you'd need Hashable in base.

Right, I suggest the following community course of action:

1a) add ordNub to Data.Set
1b) add ordNub to Data.Hashable
(1 day)

2) make a libraries@ proposal to include a stripped-down Data.Set-like
balanced binary tree implementation to base.
(2 weeks)

3) bikeshed about the name, eg.:
  * is "nub" really intuitive? how about "uniq", like in
perl/ruby/underscore.js?
  * but "uniq" in unix only removes _adjacent_ duplicates, confusing!
  * how about "distinct"? "sole"? "unique"? "azygous"?
(7 weeks)

4) Failing consensus on technical grounds (that the stripped-down
Data.Set implementation is overkill for one library function), agree
that anyone who really cares should just use the version from
containers or hashable. Only newbs and textbook authors actually use
base anyway, and it's impossible to change the language definition.
Prelude will continue to fulfil its role of avoiding success at all
costs, quadratic or otherwise.

(Please, let's have both 1a and 1b :)

Conrad.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Thomas DuBuisson
In reply to this post by Niklas Hambüchen
Just so people are aware - five years ago the notion of nubOrd and
nubWith was discussed and a consensus reached on including nubOrd.  I
think Bart got too busy, didn't submit a final patch, and no one with
commit access actually commited any code.

http://haskell.1045720.n5.nabble.com/GHC-2717-Add-nubWith-nubOrd-td3159919.html

I fully support an efficient nub implementation making its way into
base - it's far past time.  Using Set seems sensible.

Cheers,
Thomas



On Sun, Jul 14, 2013 at 4:20 AM, Niklas Hambüchen <[hidden email]> wrote:

> tldr: nub is abnormally slow, we shouldn't use it, but we do.
>
>
> As you might know, Data.List.nub is O(n²). (*)
>
> As you might not know, almost *all* practical Haskell projects use it,
> and that in places where an Ord instance is given, e.g. happy, Xmonad,
> ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
> more (see https://github.com/nh2/haskell-ordnub).
>
> I've taken the Ord-based O(n * log n) implementation from yi using a Set:
>
>   ordNub :: (Ord a) => [a] -> [a]
>   ordNub l = go empty l
>     where
>       go _ []     = []
>       go s (x:xs) = if x `member` s then go s xs
>                                     else x : go (insert x s) xs
>
>
> and put benchmarks on
> http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
> (compare `nub` vs `ordNub`).
>
> `ordNub` is not only in a different complexity class, but even seems to
> perform better than nub for very small numbers of actually different
> list elements (that's the numbers before the benchmark names).
>
> (The benchmark also shows some other potential problem: Using a state
> monad to keep the set instead of a function argument can be up to 20
> times slower. Should that happen?)
>
> What do you think about ordNub?
>
> I've seen a proposal from 5 years ago about adding a *sort*Nub function
> started by Neil, but it just died.
>
>
> (*) The mentioned complexity is for the (very common) worst case, in
> which the number of different elements in the list grows with the list
> (alias you don't have an N element list with always only 5 different
> things inside).
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Brandon Allbery
In reply to this post by Clark Gaebel-2
On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel <[hidden email]> wrote:

Oops sorry I guess my point wasn't clear.

Why ord based when hashable is faster? Then there's no reason this has to be in base, it can just be a 

Did the point about "stable" fly overhead? 

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Jason Dagit-3
In reply to this post by Niklas Hambüchen
On Sun, Jul 14, 2013 at 4:20 AM, Niklas Hambüchen <[hidden email]> wrote:

> tldr: nub is abnormally slow, we shouldn't use it, but we do.
>
>
> As you might know, Data.List.nub is O(n²). (*)
>
> As you might not know, almost *all* practical Haskell projects use it,
> and that in places where an Ord instance is given, e.g. happy, Xmonad,
> ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
> more (see https://github.com/nh2/haskell-ordnub).
>
> I've taken the Ord-based O(n * log n) implementation from yi using a Set:
>
>   ordNub :: (Ord a) => [a] -> [a]
>   ordNub l = go empty l
>     where
>       go _ []     = []
>       go s (x:xs) = if x `member` s then go s xs
>                                     else x : go (insert x s) xs
>
>
> and put benchmarks on
> http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
> (compare `nub` vs `ordNub`).

Richard Bird has a book, "Pearls of Functional Algorithm Design" that
is meant to teach a form of deriving algorithms from the properties we
ask of them. In this book, he gives a possible derivation of ordNub,
simply called nub in the book, following the methodology he is
teaching. He notes in the text that this derivation feels more
complicated than it ought.

Here is his version: http://lpaste.net/87625

I just sent you a pull request to add that one and S.toList .
S.fromList that was suggested in this thread. I don't think those two
implementations are faster than the others but it's nice to have them
for completeness.

Jason

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Niklas Hambüchen
Hey Jason,

would you mind giving a short idea of what the point of Bird's
implementation is / from what properties it is derived?

Also, running the QuickCheck tests you added, it doesn't give the same
output (order) as nub.

On 15/07/13 13:26, Jason Dagit wrote:

> Richard Bird has a book, "Pearls of Functional Algorithm Design" that
> is meant to teach a form of deriving algorithms from the properties we
> ask of them. In this book, he gives a possible derivation of ordNub,
> simply called nub in the book, following the methodology he is
> teaching. He notes in the text that this derivation feels more
> complicated than it ought.
>
> Here is his version: http://lpaste.net/87625
>
> I just sent you a pull request to add that one and S.toList .
> S.fromList that was suggested in this thread. I don't think those two
> implementations are faster than the others but it's nice to have them
> for completeness.
>
> Jason
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Clark Gaebel-2
In reply to this post by Brandon Allbery
Apologies. I was being lazy. Here's a stable version:

  import qualified Data.HashSet as S

  hashNub :: (Ord a) => [a] -> [a]
  hashNub l = go S.empty l
    where
      go _ []     = []
      go s (x:xs) = if x `S.member` s then go s xs
                                    else x : go (S.insert x s) xs

Which, again, will probably be faster than the one using Ord, and I
can't think of any cases where I'd want the one using Ord instead. I
may just not be creative enough, though.


  - Clark

On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery <[hidden email]> wrote:

> On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel <[hidden email]> wrote:
>>
>> Oops sorry I guess my point wasn't clear.
>>
>> Why ord based when hashable is faster? Then there's no reason this has to
>> be in base, it can just be a
>
> Did the point about "stable" fly overhead?
>
> --
> brandon s allbery kf8nh                               sine nomine associates
> [hidden email]                                  [hidden email]
> unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

John Lato-2
In my tests, using unordered-containers was slightly slower than using Ord, although as the number of repeated elements grows unordered-containers appears to have an advantage.  I'm sure the relative costs of comparison vs hashing would affect this also.  But both are dramatically better than the current nub.

Has anyone looked at Bart's patches to see how difficult it would be to apply them (or re-write them)?


On Mon, Jul 15, 2013 at 8:43 PM, Clark Gaebel <[hidden email]> wrote:
Apologies. I was being lazy. Here's a stable version:

  import qualified Data.HashSet as S

  hashNub :: (Ord a) => [a] -> [a]
  hashNub l = go S.empty l
    where
      go _ []     = []
      go s (x:xs) = if x `S.member` s then go s xs
                                    else x : go (S.insert x s) xs

Which, again, will probably be faster than the one using Ord, and I
can't think of any cases where I'd want the one using Ord instead. I
may just not be creative enough, though.


  - Clark

On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery <[hidden email]> wrote:
> On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel <[hidden email]> wrote:
>>
>> Oops sorry I guess my point wasn't clear.
>>
>> Why ord based when hashable is faster? Then there's no reason this has to
>> be in base, it can just be a
>
> Did the point about "stable" fly overhead?
>
> --
> brandon s allbery kf8nh                               sine nomine associates
> [hidden email]                                  [hidden email]
> unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Ivan Lazar Miljenovic
On 16 July 2013 11:46, John Lato <[hidden email]> wrote:
> In my tests, using unordered-containers was slightly slower than using Ord,
> although as the number of repeated elements grows unordered-containers
> appears to have an advantage.  I'm sure the relative costs of comparison vs
> hashing would affect this also.  But both are dramatically better than the
> current nub.
>
> Has anyone looked at Bart's patches to see how difficult it would be to
> apply them (or re-write them)?

If I understand correctly, this function is proposed to be added to
Data.List which lives in base... but the proposals here are about
using either Sets from containers or HashSet from
unordered-containers; I thought base wasn't supposed to depend on any
other package :/

>
>
>
> On Mon, Jul 15, 2013 at 8:43 PM, Clark Gaebel <[hidden email]> wrote:
>>
>> Apologies. I was being lazy. Here's a stable version:
>>
>>   import qualified Data.HashSet as S
>>
>>   hashNub :: (Ord a) => [a] -> [a]
>>   hashNub l = go S.empty l
>>     where
>>       go _ []     = []
>>       go s (x:xs) = if x `S.member` s then go s xs
>>                                     else x : go (S.insert x s) xs
>>
>> Which, again, will probably be faster than the one using Ord, and I
>> can't think of any cases where I'd want the one using Ord instead. I
>> may just not be creative enough, though.
>>
>>
>>   - Clark
>>
>> On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery <[hidden email]>
>> wrote:
>> > On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel <[hidden email]>
>> > wrote:
>> >>
>> >> Oops sorry I guess my point wasn't clear.
>> >>
>> >> Why ord based when hashable is faster? Then there's no reason this has
>> >> to
>> >> be in base, it can just be a
>> >
>> > Did the point about "stable" fly overhead?
>> >
>> > --
>> > brandon s allbery kf8nh                               sine nomine
>> > associates
>> > [hidden email]
>> > [hidden email]
>> > unix, openafs, kerberos, infrastructure, xmonad
>> > http://sinenomine.net
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



--
Ivan Lazar Miljenovic
[hidden email]
http://IvanMiljenovic.wordpress.com

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

John Lato-2
On Tue, Jul 16, 2013 at 10:31 AM, Ivan Lazar Miljenovic <[hidden email]> wrote:
On 16 July 2013 11:46, John Lato <[hidden email]> wrote:
> In my tests, using unordered-containers was slightly slower than using Ord,
> although as the number of repeated elements grows unordered-containers
> appears to have an advantage.  I'm sure the relative costs of comparison vs
> hashing would affect this also.  But both are dramatically better than the
> current nub.
>
> Has anyone looked at Bart's patches to see how difficult it would be to
> apply them (or re-write them)?

If I understand correctly, this function is proposed to be added to
Data.List which lives in base... but the proposals here are about
using either Sets from containers or HashSet from
unordered-containers; I thought base wasn't supposed to depend on any
other package :/

That was one of the points up for discussion: is it worth including a subset of Set functionality to enable a much better nub in base?  Is it even worth having Data.List.nub if it has quadratic complexity?

As an alternative, Bart's proposal was for both including ordNub in containers and an improved nub (with no dependencies outside base) in Data.List.  Unfortunately the patches are quite old (darcs format), and I don't know how they'd apply to the current situation.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Clark Gaebel-2
I'm procrastinating something else, so I wrote the patch to
unordered-containers. Feel free to comment on the github link:

https://github.com/tibbe/unordered-containers/pull/67

I'm still against having an Ord version, since my intuition tells me
that hash-based data structures are faster than ordered ones. Someone
else can write the patch, though!

As a tangent, can anyone think of a data structure for which you can
write an Ord instance but Hashable/Eq is impossible (or prove
otherwise)? How about the converse?

Regards,
  - Clark

On Mon, Jul 15, 2013 at 10:40 PM, John Lato <[hidden email]> wrote:

> On Tue, Jul 16, 2013 at 10:31 AM, Ivan Lazar Miljenovic
> <[hidden email]> wrote:
>>
>> On 16 July 2013 11:46, John Lato <[hidden email]> wrote:
>> > In my tests, using unordered-containers was slightly slower than using
>> > Ord,
>> > although as the number of repeated elements grows unordered-containers
>> > appears to have an advantage.  I'm sure the relative costs of comparison
>> > vs
>> > hashing would affect this also.  But both are dramatically better than
>> > the
>> > current nub.
>> >
>> > Has anyone looked at Bart's patches to see how difficult it would be to
>> > apply them (or re-write them)?
>>
>> If I understand correctly, this function is proposed to be added to
>> Data.List which lives in base... but the proposals here are about
>> using either Sets from containers or HashSet from
>> unordered-containers; I thought base wasn't supposed to depend on any
>> other package :/
>
>
> That was one of the points up for discussion: is it worth including a subset
> of Set functionality to enable a much better nub in base?  Is it even worth
> having Data.List.nub if it has quadratic complexity?
>
> As an alternative, Bart's proposal was for both including ordNub in
> containers and an improved nub (with no dependencies outside base) in
> Data.List.  Unfortunately the patches are quite old (darcs format), and I
> don't know how they'd apply to the current situation.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Brandon Allbery
In reply to this post by Ivan Lazar Miljenovic
On Mon, Jul 15, 2013 at 10:31 PM, Ivan Lazar Miljenovic <[hidden email]> wrote:
If I understand correctly, this function is proposed to be added to
Data.List which lives in base... but the proposals here are about
using either Sets from containers or HashSet from
unordered-containers; I thought base wasn't supposed to depend on any
other package :/

As I understand it, currently we have a double proposal: simple ordNub in Data.List without external dependencies, and the other one in containers and/or unordered-containers as appropriate.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Richard A. O'Keefe
In reply to this post by Clark Gaebel-2

On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
>
> I'm still against having an Ord version, since my intuition tells me
> that hash-based data structures are faster than ordered ones.

There are at least four different things that "an Ord version" might
mean:

 - first sort a list, then eliminate duplicates
 - sort a list eliminating duplicates stably as you go
   (think 'merge sort', using 'union' instead of 'merge')
 - build a balanced tree set as you go
 - having a list that is already sorted, use that to
   eliminated duplicates cheaply.

These things have different costs.  For example, if there are N
elements of which U are unique, the first as O(N.log N) cost,
the third has O(N.log U) cost, and the fourth has O(N) cost.

What I want is more often ordNubBy than ordNub, though.

> Someone
> else can write the patch, though!
>
> As a tangent, can anyone think of a data structure for which you can
> write an Ord instance but Hashable/Eq is impossible (or prove
> otherwise)? How about the converse?

Since Ord has Eq as a superclass, and since 0 is a functionally
correct hash value for anything, if you can implement Ord you
can obviously implement Hashable/Eq.  Whether it is *useful* to
do so is another question.

It turns out that it _is_ possible to define good quality hash
functions on sets, but most code in the field to do so is pretty bad.
(Just a modular sum or exclusive or.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ordNub

Clark Gaebel-2
nubBy is a very good suggestion. Added!

Regarding good hash functions: if your data structure is algebraic,
you can derive generic and Hashable will give you a pretty good hash
function:

> data ADT a = C0 Int String | C1 [a]
>   deriving Generic
>
> instance Hashable a => Hashable (ADT a)

It's magic!

  - Clark

On Mon, Jul 15, 2013 at 11:35 PM, Richard A. O'Keefe <[hidden email]> wrote:

>
> On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
>>
>> I'm still against having an Ord version, since my intuition tells me
>> that hash-based data structures are faster than ordered ones.
>
> There are at least four different things that "an Ord version" might
> mean:
>
>  - first sort a list, then eliminate duplicates
>  - sort a list eliminating duplicates stably as you go
>    (think 'merge sort', using 'union' instead of 'merge')
>  - build a balanced tree set as you go
>  - having a list that is already sorted, use that to
>    eliminated duplicates cheaply.
>
> These things have different costs.  For example, if there are N
> elements of which U are unique, the first as O(N.log N) cost,
> the third has O(N.log U) cost, and the fourth has O(N) cost.
>
> What I want is more often ordNubBy than ordNub, though.
>
>> Someone
>> else can write the patch, though!
>>
>> As a tangent, can anyone think of a data structure for which you can
>> write an Ord instance but Hashable/Eq is impossible (or prove
>> otherwise)? How about the converse?
>
> Since Ord has Eq as a superclass, and since 0 is a functionally
> correct hash value for anything, if you can implement Ord you
> can obviously implement Hashable/Eq.  Whether it is *useful* to
> do so is another question.
>
> It turns out that it _is_ possible to define good quality hash
> functions on sets, but most code in the field to do so is pretty bad.
> (Just a modular sum or exclusive or.)
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
1234