Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
Hi Edward,

> -----Original message-----
> From: Edward Kmett <[hidden email]>
> Sent: 6 Aug 2013, 14:26
>
> On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka <[hidden email]> wrote:
>
> > Hi Edward,I am not suggesting we should change the behaviour of existing
> > functions
> > and traverseWithKey_ should definitely use the same order as
> > traverseWithKey. Changing semantics without changing type signatures is
> > really suspicious and usually plainly wrong.
> >
>
> I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
>
> >   traverseWithKey_ f = go
> >     where go Tip = pure ()
> >           go (Bin _ k v l r) = f k v *> go l *> go r
> >
> >
> ... which visits the key/value pairs out of order unlike, say:
>
>   go (Bin _ k v l r = go l *> f k v *> go r

Oh, yes, we will definitely use the definition you suggest.

> > Nevertheless, I was wondering whether we should have a monadic fold
> > (foldrM and foldlM) which would process the elements in a given order
> > (ascending and descending, analogously to foldr and foldl). From one
> > point of view, we can implement foldrM and foldlM using foldr and foldl,
> >
>
> Sure, foldrM is typically implemented in terms of foldl and foldlM is
> typically implemented in terms of foldr.
>
> Do the usual definitions like that leak on a Map?

It is difficult to say whether it is a 'leak'. These methods (they are
the same as Foldable.foldrM and Foldable.foldlM) heap-allocate space
linear in the size of the map (to create the closures). When implemented
directly, they do not heap-allocate.

> foldrM :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m
> bfoldrM f z0 xs = foldl f' return xs z0  where f' k x z = f x z >>= k
>
> foldlM :: (Foldable t, Monad m) => (a -> b -> m a) -> a -> t b -> m
> afoldlM f z0 xs = foldr f' return xs z0  where f' x k z = f z x >>= k
>
> nevertheless using linear heap space complexity compared to constant
> > heap space complexity we can achieve with specialized implementations.
> > This is the same situation as traverseWithKey_ -- we can implement it
> > using traverseWithKey, but the heap space complexity increases.
> >
>
> traverseWithKey_ would normally be implemented with an appropriate newtype
> and foldMapWithKey, rather than traverseWithKey. Does that also leak?

That does not leak, as Shachaf Ben-Kiki pointed out. That is one of the
reasons why this discussion is so long :)

BTW, Foldable.traverse_ also heap-allocates space linear in the size of
the map, because it is defined as
  traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
  traverse_ f = foldr ((*>) . f) (pure ())
Maybe it would be better to define it using foldMap + the appropriate
newtype? Then it would not heap-allocate, at least for Data.Map.

Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Edward Kmett-2
On Tue, Aug 6, 2013 at 4:19 PM, Milan Straka <[hidden email]> wrote:
Hi Edward,

> -----Original message-----
> From: Edward Kmett <[hidden email]>
> Sent: 6 Aug 2013, 14:26
>
> On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka <[hidden email]> wrote:
>
> > Hi Edward,I am not suggesting we should change the behaviour of existing
> > functions
> > and traverseWithKey_ should definitely use the same order as
> > traverseWithKey. Changing semantics without changing type signatures is
> > really suspicious and usually plainly wrong.
> >
>
> I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
>
> >   traverseWithKey_ f = go
> >     where go Tip = pure ()
> >           go (Bin _ k v l r) = f k v *> go l *> go r
> >
> >
> ... which visits the key/value pairs out of order unlike, say:
>
>   go (Bin _ k v l r = go l *> f k v *> go r

Oh, yes, we will definitely use the definition you suggest.

> > Nevertheless, I was wondering whether we should have a monadic fold
> > (foldrM and foldlM) which would process the elements in a given order
> > (ascending and descending, analogously to foldr and foldl). From one
> > point of view, we can implement foldrM and foldlM using foldr and foldl,
> >
>
> Sure, foldrM is typically implemented in terms of foldl and foldlM is
> typically implemented in terms of foldr.
>
> Do the usual definitions like that leak on a Map?

It is difficult to say whether it is a 'leak'. These methods (they are
the same as Foldable.foldrM and Foldable.foldlM) heap-allocate space
linear in the size of the map (to create the closures). When implemented
directly, they do not heap-allocate.

Ick. I hadn't walked through the expansion of those by hands and had admittedly just hoped they worked out of the box. 

That means the similar combinators we have for them in lens probably also leak. =(

This indicates to me we may want to bite the bullet and move foldlM and foldrM into the Foldable class. 

As hideous as that is, the unmitigable space leak strikes me as worse.
 
> traverseWithKey_ would normally be implemented with an appropriate newtype
> and foldMapWithKey, rather than traverseWithKey. Does that also leak?

That does not leak, as Shachaf Ben-Kiki pointed out. That is one of the
reasons why this discussion is so long :)

BTW, Foldable.traverse_ also heap-allocates space linear in the size of
the map, because it is defined as
  traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
  traverse_ f = foldr ((*>) . f) (pure ())
Maybe it would be better to define it using foldMap + the appropriate
newtype? Then it would not heap-allocate, at least for Data.Map.

Definitely. I'll talk to Shachaf and craft a suitable proposal to fix up traverse_.

-Edward

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

RE: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Simon Peyton Jones
In reply to this post by Ryan Newton

Sounds neat!  Is nested CPR analysis on a branch?

 

Not yet.. on my laptop only so far.

 

I’ve lost context on this traverseWithKey_ thread.  If you or Milan need my help, could you start again, stating the problem with a standalone test case?  I gather from Milan that may there isn’t a problem, or at least not a problem that GHC can reasonably solve.  If there’s nothing to follow up, no need to reply.  Thanks

 

Simon

 

From: Ryan Newton [mailto:[hidden email]]
Sent: 06 August 2013 14:51
To: Sjoerd Visscher
Cc: Milan Straka; Simon Peyton-Jones; Haskell Libraries
Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

 

On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones <[hidden email]> wrote:

> Fixing this involves *nested* CPR analysis, which I am working on at the moment.

Sounds neat!  Is nested CPR analysis on a branch?

This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other?  Then I could investigate more easily. 

Milan pasted the relevant code (thanks) for traverseWithKey_ vs. foldWithKey which I reattach at the bottom of this email.

On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher <[hidden email]> wrote:

Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!

 

That sounds good to me!  And straightforward.  Any objections?

 

Milan, why does it bother you that there is no specified order?  I'm perfectly happy as long as its deterministic on a particular machine.  (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.)  Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified?  Further, other languages (like Scheme) have maps/folds that do not specify order of side effects.

 

  -Ryan 

 

P.S.: Relevant code:

 

traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t ()
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r


  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where go z' Tip = z'
          go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

and we call them like this:
  let actionTraverse _k 1000000 = putStrLn "Hit 1000000"
      actionTraverse _k _v = return ()
  in traverseWithKey_ actionTraverse map

and this:
  let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z
      actionFoldr _k _v z = z
  in foldrWithKey actionFoldr (pure ()) map

 


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
123