Proposal to add mapAccumL function to vector package

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal to add mapAccumL function to vector package

John Ky
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John


_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add mapAccumL function to vector package

Carter Schonwald
Hey John
I’m happy to help you contribute to to vector. 

1). This might not actually be needed with stream fusion on  ... though perhaps this sort of shared computation needs to be its own combinators because of the sharing meaning that the stream fusion on map and foldl might not work. (Have you tried doing let x = map ... in let y = foldl ... in something with x and y? Even eg just writing map accum l in terms of just that? It could very well fuse ... though I don’t think it can with the current vector fusion framework. Though I think one of the more exotic fusion frameworks Amos Robinson did a few years ago could handle that fusion.   Sadly that one requires an ILP solver at compile time.  But there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will actually be simpler I think 

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a bit slow), but this sounds like a super reasonable feature request 



On Thu, Nov 8, 2018 at 6:31 AM John Ky <[hidden email]> wrote:
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add mapAccumL function to vector package

Carter Schonwald
I thought about it a teeny bit more 

This should be trivially definable using mapM or equivalent using state t 

Have you tried doing that simple high level definition?  I think that works with vector fusion quite nicely. 

... ohhh. I see.  There’s two vectors of inputs.  I’ll have to think about this more. 

On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <[hidden email]> wrote:
Hey John
I’m happy to help you contribute to to vector. 

1). This might not actually be needed with stream fusion on  ... though perhaps this sort of shared computation needs to be its own combinators because of the sharing meaning that the stream fusion on map and foldl might not work. (Have you tried doing let x = map ... in let y = foldl ... in something with x and y? Even eg just writing map accum l in terms of just that? It could very well fuse ... though I don’t think it can with the current vector fusion framework. Though I think one of the more exotic fusion frameworks Amos Robinson did a few years ago could handle that fusion.   Sadly that one requires an ILP solver at compile time.  But there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will actually be simpler I think 

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a bit slow), but this sounds like a super reasonable feature request 



On Thu, Nov 8, 2018 at 6:31 AM John Ky <[hidden email]> wrote:
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add mapAccumL function to vector package

John Ky
Hi Carter,

I've created a ticket as requested and have reproduced the text here:

I've implemented a hand rolled version, and another two versions based on a combination of mapM and the lazy and strict versions of State monad.

haskell-works/hw-prim#38

The benchmarks show that the hand rolled versions run two times faster than the lazy state monad version and 16 times faster than the strict state monad.

I found the slow performance of the strict monad version most surprising.

I'm aware that the version that using mapM might enable fusion, however it is a fair bit slower than a hand rolled version that defeats fusion.

I would love to have a fusion-enabled version that runs as fast as the hand rolled version. Would that be possible?


Cheers,


-John


On Fri, 9 Nov 2018 at 01:18, Carter Schonwald <[hidden email]> wrote:
I thought about it a teeny bit more 

This should be trivially definable using mapM or equivalent using state t 

Have you tried doing that simple high level definition?  I think that works with vector fusion quite nicely. 

... ohhh. I see.  There’s two vectors of inputs.  I’ll have to think about this more. 

On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <[hidden email]> wrote:
Hey John
I’m happy to help you contribute to to vector. 

1). This might not actually be needed with stream fusion on  ... though perhaps this sort of shared computation needs to be its own combinators because of the sharing meaning that the stream fusion on map and foldl might not work. (Have you tried doing let x = map ... in let y = foldl ... in something with x and y? Even eg just writing map accum l in terms of just that? It could very well fuse ... though I don’t think it can with the current vector fusion framework. Though I think one of the more exotic fusion frameworks Amos Robinson did a few years ago could handle that fusion.   Sadly that one requires an ILP solver at compile time.  But there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will actually be simpler I think 

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a bit slow), but this sounds like a super reasonable feature request 



On Thu, Nov 8, 2018 at 6:31 AM John Ky <[hidden email]> wrote:
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add mapAccumL function to vector package

John Ky

After removing the load time of the file from benchmarking, the difference between mapM and hand rolled version is even more stark 😱

benchmarking medium.csv/mapAccumL               for DVS.Vector Word64
time                 2.371 ms   (2.347 ms .. 2.396 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 2.442 ms   (2.420 ms .. 2.477 ms)
std dev              91.95 μs   (57.33 μs .. 137.7 μs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking medium.csv/mapAccumLViaStrictState for DVS.Vector Word64
time                 604.4 ms   (479.0 ms .. 700.5 ms)
                     0.995 R²   (0.983 R² .. 1.000 R²)
mean                 643.8 ms   (607.3 ms .. 695.2 ms)
std dev              51.95 ms   (13.07 ms .. 69.16 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking medium.csv/mapAccumLViaLazyState   for DVS.Vector Word64
time                 62.08 ms   (61.14 ms .. 63.23 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 63.16 ms   (62.43 ms .. 65.35 ms)
std dev              2.013 ms   (637.6 μs .. 3.434 ms)

On Fri, 9 Nov 2018 at 09:31, John Ky <[hidden email]> wrote:
Hi Carter,

I've created a ticket as requested and have reproduced the text here:

I've implemented a hand rolled version, and another two versions based on a combination of mapM and the lazy and strict versions of State monad.

haskell-works/hw-prim#38

The benchmarks show that the hand rolled versions run two times faster than the lazy state monad version and 16 times faster than the strict state monad.

I found the slow performance of the strict monad version most surprising.

I'm aware that the version that using mapM might enable fusion, however it is a fair bit slower than a hand rolled version that defeats fusion.

I would love to have a fusion-enabled version that runs as fast as the hand rolled version. Would that be possible?


Cheers,


-John


On Fri, 9 Nov 2018 at 01:18, Carter Schonwald <[hidden email]> wrote:
I thought about it a teeny bit more 

This should be trivially definable using mapM or equivalent using state t 

Have you tried doing that simple high level definition?  I think that works with vector fusion quite nicely. 

... ohhh. I see.  There’s two vectors of inputs.  I’ll have to think about this more. 

On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <[hidden email]> wrote:
Hey John
I’m happy to help you contribute to to vector. 

1). This might not actually be needed with stream fusion on  ... though perhaps this sort of shared computation needs to be its own combinators because of the sharing meaning that the stream fusion on map and foldl might not work. (Have you tried doing let x = map ... in let y = foldl ... in something with x and y? Even eg just writing map accum l in terms of just that? It could very well fuse ... though I don’t think it can with the current vector fusion framework. Though I think one of the more exotic fusion frameworks Amos Robinson did a few years ago could handle that fusion.   Sadly that one requires an ILP solver at compile time.  But there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will actually be simpler I think 

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a bit slow), but this sounds like a super reasonable feature request 



On Thu, Nov 8, 2018 at 6:31 AM John Ky <[hidden email]> wrote:
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add mapAccumL function to vector package

Carter Schonwald
great

the one using streams for your hand rolled stuff would be more instructive... your proposed definiton cant fuse.

i'll switch to the ticket

On Thu, Nov 8, 2018 at 5:53 PM John Ky <[hidden email]> wrote:

After removing the load time of the file from benchmarking, the difference between mapM and hand rolled version is even more stark 😱

benchmarking medium.csv/mapAccumL               for DVS.Vector Word64
time                 2.371 ms   (2.347 ms .. 2.396 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 2.442 ms   (2.420 ms .. 2.477 ms)
std dev              91.95 μs   (57.33 μs .. 137.7 μs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking medium.csv/mapAccumLViaStrictState for DVS.Vector Word64
time                 604.4 ms   (479.0 ms .. 700.5 ms)
                     0.995 R²   (0.983 R² .. 1.000 R²)
mean                 643.8 ms   (607.3 ms .. 695.2 ms)
std dev              51.95 ms   (13.07 ms .. 69.16 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking medium.csv/mapAccumLViaLazyState   for DVS.Vector Word64
time                 62.08 ms   (61.14 ms .. 63.23 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 63.16 ms   (62.43 ms .. 65.35 ms)
std dev              2.013 ms   (637.6 μs .. 3.434 ms)

On Fri, 9 Nov 2018 at 09:31, John Ky <[hidden email]> wrote:
Hi Carter,

I've created a ticket as requested and have reproduced the text here:

I've implemented a hand rolled version, and another two versions based on a combination of mapM and the lazy and strict versions of State monad.

haskell-works/hw-prim#38

The benchmarks show that the hand rolled versions run two times faster than the lazy state monad version and 16 times faster than the strict state monad.

I found the slow performance of the strict monad version most surprising.

I'm aware that the version that using mapM might enable fusion, however it is a fair bit slower than a hand rolled version that defeats fusion.

I would love to have a fusion-enabled version that runs as fast as the hand rolled version. Would that be possible?


Cheers,


-John


On Fri, 9 Nov 2018 at 01:18, Carter Schonwald <[hidden email]> wrote:
I thought about it a teeny bit more 

This should be trivially definable using mapM or equivalent using state t 

Have you tried doing that simple high level definition?  I think that works with vector fusion quite nicely. 

... ohhh. I see.  There’s two vectors of inputs.  I’ll have to think about this more. 

On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <[hidden email]> wrote:
Hey John
I’m happy to help you contribute to to vector. 

1). This might not actually be needed with stream fusion on  ... though perhaps this sort of shared computation needs to be its own combinators because of the sharing meaning that the stream fusion on map and foldl might not work. (Have you tried doing let x = map ... in let y = foldl ... in something with x and y? Even eg just writing map accum l in terms of just that? It could very well fuse ... though I don’t think it can with the current vector fusion framework. Though I think one of the more exotic fusion frameworks Amos Robinson did a few years ago could handle that fusion.   Sadly that one requires an ILP solver at compile time.  But there’s some tricks I think we could do )

2) writing it in stream fusion form / as map accum l for streams will actually be simpler I think 

3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got me a bit slow), but this sounds like a super reasonable feature request 



On Thu, Nov 8, 2018 at 6:31 AM John Ky <[hidden email]> wrote:
Hello,

I'd like to add the mapAccumL function to the vector package.

Specifically the Data.Vector.Storable module, but it would also be useful other vector modules.

This is my attempt at an implementation:

{-# LANGUAGE ScopedTypeVariables #-}

mapAccumL :: forall a b c. (Storable b, Storable c)
=> (a -> b -> (a, c))
-> a
-> DVS.Vector b
-> (a, DVS.Vector c)
mapAccumL f a vb = DVS.createT $ do
vc <- DVSM.unsafeNew (DVS.length vb)
a' <- go 0 a vc
return (a', vc)
where go :: Int -> a -> DVS.MVector s c -> ST s a
go i a0 vc = if i < DVS.length vb
then do
let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
DVSM.unsafeWrite vc i c1
go (i + 1) a1 vc
else return a0
{-# INLINE mapAccumL #-}

The implementation should obey the following law:

import qualified Data.List as L
import qualified Data.Vector.Storable as DVS

(DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f a bs

Cheers,

-John

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries