How to improve its performance ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

How to improve its performance ?

z_axis
import Data.List

combination :: [a] -> [[a]]
combination [] =  [[]]
combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)

samp = [1..100]
allTwoGroup = [(x, samp\\x) | x <- combination samp]

The above code is used to calculate all the two groups from sample data ? It is very slow !

Sincerely!
e^(π.i) + 1 = 0
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Daniel Fischer-4
Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:
> import Data.List
>
> combination :: [a] -> [[a]]
> combination [] =  [[]]
> combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as
representing a set), I think. And, apart from the order in which they are
generated, it's the same as Data.List.subsequences (only less efficient).

>
> samp = [1..100]
> allTwoGroup = [(x, samp\\x) | x <- combination samp]
>
> The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

> ? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual).
There are two points where you waste time.
First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists
come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys).
Also, (\\) requires an Eq constraint. If you're willing to constrain the
type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered
lists, you can replace (\\) by the much faster difference of oredered lists
(implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two
complementary sublists at the same time.
The idea is,
-- An empty list has one partition into two complementary sublists:
partitions2 [] = [([],[])]
-- For a nonempty list (x:xs), the partitions into two complementary
-- sublists each have x either in the first sublist or in the second.
-- Each partition induces a corresponding partition of the tail, xs,
-- by removing x from the group in which it appears.
-- Conversely, every partition ox xs gives rise to two partitions
-- of (x:xs), by adding x to either the first or the second sublist. So
partitions2 (x:xs)
    = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

>
> Sincerely!
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Alexander Solla-2

On Mar 17, 2010, at 6:14 PM, Daniel Fischer wrote:

> I found it surprisingly not-slow (code compiled with -O2, as usual).
> There are two points where you waste time.


I found one big point where time is wasted:  in computing the powerset  
of a list.  He's making 2^n elements, and then iterating through them  
all and filtering, but only needs n^2 or n `choose` 2 of the  
(depending on the semantics for his "groups").

The answer is to do something like:

allPairs list = [(x,y) | x <- list, y <- list]

to get it done in n^2 time.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

z_axis
In reply to this post by Daniel Fischer-4
The time is wasted to run combination even if use `combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs] ' instead.
in ghci
>combination [1..20]
will wait for a long time .......

Daniel Fischer-4 wrote
Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:
> import Data.List
>
> combination :: [a] -> [[a]]
> combination [] =  [[]]
> combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as
representing a set), I think. And, apart from the order in which they are
generated, it's the same as Data.List.subsequences (only less efficient).

>
> samp = [1..100]
> allTwoGroup = [(x, samp\\x) | x <- combination samp]
>
> The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

> ? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual).
There are two points where you waste time.
First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists
come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys).
Also, (\\) requires an Eq constraint. If you're willing to constrain the
type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered
lists, you can replace (\\) by the much faster difference of oredered lists
(implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two
complementary sublists at the same time.
The idea is,
-- An empty list has one partition into two complementary sublists:
partitions2 [] = [([],[])]
-- For a nonempty list (x:xs), the partitions into two complementary
-- sublists each have x either in the first sublist or in the second.
-- Each partition induces a corresponding partition of the tail, xs,
-- by removing x from the group in which it appears.
-- Conversely, every partition ox xs gives rise to two partitions
-- of (x:xs), by adding x to either the first or the second sublist. So
partitions2 (x:xs)
    = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

>
> Sincerely!
>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
e^(π.i) + 1 = 0
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

z_axis
In reply to this post by Alexander Solla-2
`allPairs list = [(x,y) | x <- list, y <- list]  ` is not what `combination` does !
>let allPairs list = [(x,y) | x <- list, y <- list]
>allPairs [1,2,3]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

>combination [1,2,3]
[[1,2,3],[2,3],[1,3],[3],[1,2],[2],[1],[]]

Alexander Solla-2 wrote
On Mar 17, 2010, at 6:14 PM, Daniel Fischer wrote:

> I found it surprisingly not-slow (code compiled with -O2, as usual).
> There are two points where you waste time.


I found one big point where time is wasted:  in computing the powerset  
of a list.  He's making 2^n elements, and then iterating through them  
all and filtering, but only needs n^2 or n `choose` 2 of the  
(depending on the semantics for his "groups").

The answer is to do something like:

allPairs list = [(x,y) | x <- list, y <- list]

to get it done in n^2 time.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
e^(π.i) + 1 = 0
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Alexander Solla-2

On Mar 17, 2010, at 8:33 PM, zaxis wrote:

>
> `allPairs list = [(x,y) | x <- list, y <- list]  ` is not what  
> `combination`
> does !
>> let allPairs list = [(x,y) | x <- list, y <- list]
>> allPairs [1,2,3]
> [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Yeah, I know that.  I said so specifically.  combination computes the  
power set of a list.  You said your goal was to compute a set of "two  
groups".  You don't need the power set in order to compute a set of  
pairs.  Moreover, computing the power set is a slow operation.  
Indeed, it is the source of your slowness.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Ozgur Akgun
As Daniel pointed out earlier, this may be of a little bit of help:

http://haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/src/Data-List.html#subsequences



On 18 March 2010 04:03, Alexander Solla <[hidden email]> wrote:

On Mar 17, 2010, at 8:33 PM, zaxis wrote:


`allPairs list = [(x,y) | x <- list, y <- list]  ` is not what `combination`
does !
let allPairs list = [(x,y) | x <- list, y <- list]
allPairs [1,2,3]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Yeah, I know that.  I said so specifically.  combination computes the power set of a list.  You said your goal was to compute a set of "two groups".  You don't need the power set in order to compute a set of pairs.  Moreover, computing the power set is a slow operation.  Indeed, it is the source of your slowness.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Ozgur Akgun

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Daniel Fischer-4
In reply to this post by Alexander Solla-2
Am Donnerstag 18 März 2010 05:03:28 schrieb Alexander Solla:

> On Mar 17, 2010, at 8:33 PM, zaxis wrote:
> > `allPairs list = [(x,y) | x <- list, y <- list]  ` is not what
> > `combination`
> > does !
> >
> >> let allPairs list = [(x,y) | x <- list, y <- list]
> >> allPairs [1,2,3]
> >
> > [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
>
> Yeah, I know that.  I said so specifically.  combination computes the
> power set of a list.  You said your goal was to compute a set of "two
> groups".  You don't need the power set in order to compute a set of
> pairs.  Moreover, computing the power set is a slow operation.
> Indeed, it is the source of your slowness.

I think you've been fooled by the names. If you look at the code (or run it
on a smaller sample), you'll see that allTwoGroup computes the list of all
partitions of sample into two complementary sets.
If sample were [1,2,3], the result would be (apart from the order)

[([1,2,3],[]),([1,2],[3]),([1,3],[2]),([2,3],[1]),([1],[2,3]),([2],[1,3]),
([3],[1,2]),([],[1,2,3])]

with sample = [1 .. 100], it's a list of 2^100 elements, that'll take a
long time to compute no matter how.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Daniel Fischer-4
In reply to this post by z_axis
Am Donnerstag 18 März 2010 04:29:53 schrieb zaxis:
> The time is wasted to run combination even if use `combination (x:xs) =
> concat [(x:ys), ys] | ys <- combination xs] ' instead.
> in ghci
>
> >combination [1..20]
>
> will wait for a long time .......

Hm, really?

Prelude> :set +s                                        
Prelude> let combination [] = [[]]; combination (x:xs) = [x:ys | ys <-
combination xs] ++ combination xs                                                              
(0.00 secs, 1818304 bytes)                                                              
Prelude> let combs [] = [[]]; combs (x:xs) = concat [[x:ys,ys] | ys <-
combs xs]  
(0.00 secs, 2102280 bytes)                                                              
Prelude> length [1 .. 2^20]                                                      
1048576                                                                                
(0.07 secs, 49006024 bytes)                                                            
Prelude> length $ combination [1 .. 20]
1048576                                      
(8.28 secs, 915712452 bytes)                  
Prelude> length $ combs [1 .. 20]
1048576                                
(0.78 secs, 146841964 bytes)

That's interpreted, so not optimised. Optimisation narrows the gap, speeds
both up significantly, so put


combination :: [a] -> [[a]]
combination [] = [[]]
combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs

combs :: [a] -> [[a]]
combs [] = [[]]
combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs]

mcombs :: [a] -> [[a]]
mcombs = foldr (flip (>>=) . f) [[]]
      where
        f x xs = [x:xs,xs]

in a file, compile with -O2 and load into ghci:


Prelude Parts> length $ combination [1 .. 20]
1048576
(0.16 secs, 43215220 bytes)
Prelude Parts> length $ combs [1 .. 20]
1048576
(0.05 secs, 55573436 bytes)
Prelude Parts> length $ mcombs [1 .. 20]
1048576
(0.06 secs, 55572692 bytes)
Prelude Parts> length $ combination [1 .. 24]
16777216
(3.06 secs, 674742880 bytes)
Prelude Parts> length $ combs [1 .. 24]
16777216
(0.62 secs, 881788880 bytes)
Prelude Parts> length $ mcombs [1 .. 24]
16777216
(0.62 secs, 881788956 bytes)
Prelude Parts> length [1 .. 2^24]
16777216
(0.64 secs, 675355184 bytes)

So combs and the pointfree combinator version mcombs are equally fast and
significantly faster than combination. In fact they're as fast as a simple
enumeration.

Now, if you actually let ghci print out the result, the printing takes a
long time. So much that the difference in efficiency is hardly discernible
or not at all.

>
> Daniel Fischer-4 wrote:
> > Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:
> >> import Data.List
> >>
> >> combination :: [a] -> [[a]]
> >> combination [] =  [[]]
> >> combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)
> >
> > That would normally be called sublists (or subsets, if one regards
> > lists as
> > representing a set), I think. And, apart from the order in which they
> > are generated, it's the same as Data.List.subsequences (only less
> > efficient).
> >
> >> samp = [1..100]
> >> allTwoGroup = [(x, samp\\x) | x <- combination samp]
> >>
> >> The above code is used to calculate all the two groups from sample
> >> data
> >
> > All partitions into two sublists/sets/samples.
> >
> >> ? It is very slow !
> >
> > I found it surprisingly not-slow (code compiled with -O2, as usual).
> > There are two points where you waste time.
> > First, in
> >
> > combination (x:xs)
> >
> > you calculate (combination xs) twice. If the order in which the
> > sublists come doesn't matter, it's better to do it only once:
> >
> > combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]
> >
> > Second, (\\) is slow, xs \\ ys is O(length xs * length ys).
> > Also, (\\) requires an Eq constraint. If you're willing to constrain
> > the type further, to (Ord a => [a] -> [([a],[a])]), and call it only
> > on ordered
> > lists, you can replace (\\) by the much faster difference of oredered
> > lists
> > (implementation left as an exercise for the reader).
> >
> > But you can work with unconstrained types, and faster, if you build
> > the two
> > complementary sublists at the same time.
> > The idea is,
> > -- An empty list has one partition into two complementary sublists:
> > partitions2 [] = [([],[])]
> > -- For a nonempty list (x:xs), the partitions into two complementary
> > -- sublists each have x either in the first sublist or in the second.
> > -- Each partition induces a corresponding partition of the tail, xs,
> > -- by removing x from the group in which it appears.
> > -- Conversely, every partition ox xs gives rise to two partitions
> > -- of (x:xs), by adding x to either the first or the second sublist.
> > So partitions2 (x:xs)
> >     = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]
> >
> > We can also write the second case as
> >
> > partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)
> >
> > where
> >
> > choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]
> >
> > Now it's very easy to recognise that partitions2 is a fold,
> >
> > partitions2 xs = foldr (concatMap . choice) [([],[])] xs

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

z_axis
%cat Test.hs
module Test(mcombs)
where
import Data.List

mcombs = foldr (flip (>>=) . f) [[]]  where f x xs = [x:xs,xs]

%ghc -c -O2 Test.hs
%ghci
> :l Test
Ok, modules loaded: Test.

> :set +s
length $ mcombs [1..20]
1048576
(0.06 secs, 56099528 bytes)

> length $ mcombs [1..50]
^CInterrupted.

Daniel Fischer-4 wrote
Am Donnerstag 18 März 2010 04:29:53 schrieb zaxis:
> The time is wasted to run combination even if use `combination (x:xs) =
> concat [(x:ys), ys] | ys <- combination xs] ' instead.
> in ghci
>
> >combination [1..20]
>
> will wait for a long time .......

Hm, really?

Prelude> :set +s                                        
Prelude> let combination [] = [[]]; combination (x:xs) = [x:ys | ys <-
combination xs] ++ combination xs                                                              
(0.00 secs, 1818304 bytes)                                                              
Prelude> let combs [] = [[]]; combs (x:xs) = concat [[x:ys,ys] | ys <-
combs xs]  
(0.00 secs, 2102280 bytes)                                                              
Prelude> length [1 .. 2^20]                                                      
1048576                                                                                
(0.07 secs, 49006024 bytes)                                                            
Prelude> length $ combination [1 .. 20]
1048576                                      
(8.28 secs, 915712452 bytes)                  
Prelude> length $ combs [1 .. 20]
1048576                                
(0.78 secs, 146841964 bytes)

That's interpreted, so not optimised. Optimisation narrows the gap, speeds
both up significantly, so put


combination :: [a] -> [[a]]
combination [] = [[]]
combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs

combs :: [a] -> [[a]]
combs [] = [[]]
combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs]

mcombs :: [a] -> [[a]]
mcombs = foldr (flip (>>=) . f) [[]]
      where
        f x xs = [x:xs,xs]

in a file, compile with -O2 and load into ghci:


Prelude Parts> length $ combination [1 .. 20]
1048576
(0.16 secs, 43215220 bytes)
Prelude Parts> length $ combs [1 .. 20]
1048576
(0.05 secs, 55573436 bytes)
Prelude Parts> length $ mcombs [1 .. 20]
1048576
(0.06 secs, 55572692 bytes)
Prelude Parts> length $ combination [1 .. 24]
16777216
(3.06 secs, 674742880 bytes)
Prelude Parts> length $ combs [1 .. 24]
16777216
(0.62 secs, 881788880 bytes)
Prelude Parts> length $ mcombs [1 .. 24]
16777216
(0.62 secs, 881788956 bytes)
Prelude Parts> length [1 .. 2^24]
16777216
(0.64 secs, 675355184 bytes)

So combs and the pointfree combinator version mcombs are equally fast and
significantly faster than combination. In fact they're as fast as a simple
enumeration.

Now, if you actually let ghci print out the result, the printing takes a
long time. So much that the difference in efficiency is hardly discernible
or not at all.

>
> Daniel Fischer-4 wrote:
> > Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:
> >> import Data.List
> >>
> >> combination :: [a] -> [[a]]
> >> combination [] =  [[]]
> >> combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)
> >
> > That would normally be called sublists (or subsets, if one regards
> > lists as
> > representing a set), I think. And, apart from the order in which they
> > are generated, it's the same as Data.List.subsequences (only less
> > efficient).
> >
> >> samp = [1..100]
> >> allTwoGroup = [(x, samp\\x) | x <- combination samp]
> >>
> >> The above code is used to calculate all the two groups from sample
> >> data
> >
> > All partitions into two sublists/sets/samples.
> >
> >> ? It is very slow !
> >
> > I found it surprisingly not-slow (code compiled with -O2, as usual).
> > There are two points where you waste time.
> > First, in
> >
> > combination (x:xs)
> >
> > you calculate (combination xs) twice. If the order in which the
> > sublists come doesn't matter, it's better to do it only once:
> >
> > combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]
> >
> > Second, (\\) is slow, xs \\ ys is O(length xs * length ys).
> > Also, (\\) requires an Eq constraint. If you're willing to constrain
> > the type further, to (Ord a => [a] -> [([a],[a])]), and call it only
> > on ordered
> > lists, you can replace (\\) by the much faster difference of oredered
> > lists
> > (implementation left as an exercise for the reader).
> >
> > But you can work with unconstrained types, and faster, if you build
> > the two
> > complementary sublists at the same time.
> > The idea is,
> > -- An empty list has one partition into two complementary sublists:
> > partitions2 [] = [([],[])]
> > -- For a nonempty list (x:xs), the partitions into two complementary
> > -- sublists each have x either in the first sublist or in the second.
> > -- Each partition induces a corresponding partition of the tail, xs,
> > -- by removing x from the group in which it appears.
> > -- Conversely, every partition ox xs gives rise to two partitions
> > -- of (x:xs), by adding x to either the first or the second sublist.
> > So partitions2 (x:xs)
> >     = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]
> >
> > We can also write the second case as
> >
> > partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)
> >
> > where
> >
> > choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]
> >
> > Now it's very easy to recognise that partitions2 is a fold,
> >
> > partitions2 xs = foldr (concatMap . choice) [([],[])] xs

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
e^(π.i) + 1 = 0
Reply | Threaded
Open this post in threaded view
|

Re: How to improve its performance ?

Daniel Fischer-4
Am Freitag 19 März 2010 02:56:36 schrieb zaxis:

> %cat Test.hs
> module Test(mcombs)
> where
> import Data.List
>
> mcombs = foldr (flip (>>=) . f) [[]]  where f x xs = [x:xs,xs]
>
> %ghc -c -O2 Test.hs
> %ghci
>
> > :l Test
>
> Ok, modules loaded: Test.
>
> > :set +s
>
> length $ mcombs [1..20]
> 1048576
> (0.06 secs, 56099528 bytes)
>
> > length $ mcombs [1..50]
>
> ^CInterrupted.
>

Yes, 2^50 = 1125899906842624.

If your computer generated 10^9 sublists per second, you'd wait more than
ten days for the answer. Since something like 25 million - 100 million per
second is more in the region of what ordinary computers can achieve, it'd
rather be three months to over a year.

Prelude Parts> mcombs [1 .. 50] !! 100000000
[1,2,3,4,5,6,7,8,10,11,12,13,18,20,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]
(4.35 secs, 5254369992 bytes)
Prelude Parts> length $ replicate (10^8) ()
100000000
(2.10 secs, 2806344336 bytes)

Really, not bad, IMO.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe