Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton
Well, with Haskell we *depend* on optimizations, right?  It's a not a performance-by-construction, but rather performance-by-awesome-rewriting-and-other-tricks approach.  

I set -O2 for my packages, mostly.  But I imagine that since it is NOT the default ~/.cabal/config setting, that most people installing Haskell packages are getting the default (-O0), and most packages are not specifying otherwise.  In fact, Hackage specifically discourages you when you upload a package with -O2 set!

Anyway, personally I don't care much about -O0 performance.  But I do get a little nervous when radical changes in asymptotic complexity (rather than just constant factors) come from optimization settings.  For example, as an idiom, "forM_ [1..10^9] $ ..." makes me extremely nervous as opposed to writing a proper non-allocating for-loop construct.


On Fri, Jul 5, 2013 at 2:18 AM, John Lato <[hidden email]> wrote:
Upon re-reading, this was much more terse than I intended, mostly because I didn't want to get bogged down in an off-topic tangent.  What I mean is that I think it's quite rare indeed to be in a situation where you care about performance, code with -O2 performs acceptably well, but being performance-bound at -O0 is a serious hindrance.

But that's my own experience, and I'm sure that others have had different experiences.  And this issue in particular comes up often enough that I'm sure there's something I'm missing.  Hence my question.


On Fri, Jul 5, 2013 at 10:48 AM, John Lato <[hidden email]> wrote:
Slightly off-topic, but why do you care about the performance of code compiled with -O0?  I can think of at least one valid reason for doing so, but even for that particular instance I doubt that changing the code is the proper solution.

Aside from that, I definitely agree that most packages could do a better job documenting the heap usage of functions.


On Fri, Jul 5, 2013 at 8:41 AM, Ryan Newton <[hidden email]> wrote:

After playing a bit with Ryan's benchmark, I no longer think that the
order matters much for the total number of allocations. Nor do I believe
in first-class vs non-first-class IO actions. All that should matter is
how many allocations we can move to the stack. But I haven't yet figured
out why exactly different versions differ so drastically in this regard.

Yeah, it's all rather different to predict in advance, isn't it?

I tried your alternate foldrWithKey and I saw it heap allocating as well.

Further, -O0 vs. -O2 can make a big difference.  It's a little frustrating because for dealing efficiently with big data sets, especially in parallel.  It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level.

By the way, is traverse/traverseWithKey supposed to guarantee a specific order?  The doc uses this code in the definition:

    traverse ((k, v) -> (,) k $ f k v) (toList m)

And I thought "toList" didn't guarantee anything (as opposed to toAscList)...


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries





_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton
In reply to this post by Shachaf Ben-Kiki
Shachaf, I checked and Milan's commits that improve traverseWithKey were already incorporated when I ran my tests above.  The extra speedup is good but doesn't change the O(1) vs. O(N) allocation situation.

Ok, so the discussion period for this one is over two weeks.  I don't believe there were any real objections to traverseWithKey_, rather there were several questions raised about whether the non-allocating behavior could be accomplished in other ways, which, alas didn't pan out.  

Thus if there are no other objections, can someone merge pull request #30?

P.S. I've been doing a lot of performance-oriented monadic programming with large data structures (for the LVar project), and this Map issue is only one of several places where containers API's force you to go through lists or to otherwise allocate.  I think a continuing audit would be good and will make other suggestions and pull requests as I come to them.


On Fri, Jul 5, 2013 at 3:51 AM, Shachaf Ben-Kiki <[hidden email]> wrote:
On Tue, Jul 2, 2013 at 1:46 PM, Ryan Newton <[hidden email]> wrote:
>
>>     foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r
>>     foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k
>> x))
>
>
> Shachaf, thanks for fleshing it out.  I updated the test with your version
> as well:
>
>    https://gist.github.com/rrnewton/5912908
>
> In short, it performs exactly the same as the foldrWithKey version, as you
> pointed out (32M allocation).
>
> In both cases, using first class monadic/applicative values seems to foil
> GHC.
>
> And btw, these do show the scaling you would expect, on 2M elements, it
> allocates 64MB, 4M -> 128MB, and so on, whereas the traverseWithKey_ version
> allocates a constant amount.
>
>   -Ryan
>

If you're actually benchmarking it, you should note a few things:

* As I mentioned, lens's actual implementation of this function is
slightly different:

    -- The argument 'a' of the result should not be used!
    newtype Traversed a f = Traversed { getTraversed :: f a }
    instance Applicative f => Monoid (Traversed a f) where
      mempty = Traversed (pure (error "Traversed: value used"))
      Traversed ma `mappend` Traversed mb = Traversed (ma *> mb)

with one "void" applied at the very end of the fold, instead of one in
each step. This may or may not be better; it should probably be
measured.

* Following a discussion with Milan off-list, he implemented a simple
optimization in traverseWithKey which might have a significant impact.
See https://github.com/haskell/containers/commit/4d24ff5d08f0bb27ca73a9888286d6426149515b
. It should probably be considered in these benchmarks.

* I haven't thought it through, but it's possible that using a
"difference monoid" -- i.e. folding with Endo, the way
Data.Foldable.foldr is implemented by default -- would also be useful
to measure, to compare with the existing foldrWithKey.

    Shachaf


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Shachaf Ben-Kiki
On Mon, Jul 29, 2013 at 11:47 AM, Ryan Newton <[hidden email]> wrote:
> Shachaf, I checked and Milan's commits that improve traverseWithKey were
> already incorporated when I ran my tests above.  The extra speedup is good
> but doesn't change the O(1) vs. O(N) allocation situation.

Wait, are you saying you couldn't get it to work in constant memory at
all without modifying containers? I didn't actually look at the
benchmarks in much detail before. I just looked at the code -- it
looks like in your latest posted benchmark you don't actually use the
alternate traverseWithKey_ anywhere -- instead you use foldrWithKey
twice (the perils of not compiling with -Wall!). I just ran the
benchmark with the alternate version and it looks like it uses
constant memory.

Maybe I'm misunderstanding. I'm not against adding this to containers,
but it should be clear with such a patch whether it's being added for
necessity or convenience. It looks to me like it's the latter.

(By the way: lens also provides this function, with the name
"itraverse_" (i for indexed). I tried it and it looks like it also
uses constant memory.)

    Shachaf

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton
Oops, that was sloppy.  Yes, your version does get the job done without allocating.  The corrected test is attached to this email.

The lens interface does look quite full featured!  And it's nice to see that it consistently includes '_' variants.  I cite that as additional evidence for the norm ;-).

Personally, I still want traverseWithKey_ for convenience, especially because the solution you used is non-obvious.  I imagine many Data.Map users would not come up with it (as the rest of us on this thread didn't).

Best,
  -Ryan

-----------------------------
import Control.Applicative
import Control.Monad
import Control.DeepSeq
import Control.Exception
import GHC.Stats
import qualified Data.Map.Strict as M
import Data.Time.Clock
import Data.Monoid
import System.Mem
import System.Environment

main :: IO ()
main = do
  args <- getArgs
  let size = case args of
              []  -> 1000000::Int
              [n] -> read n
  let m0 = M.fromList (map (\i -> (i,i)) [1..size])
  let fn 500000 v = putStrLn "Got it!"
      fn _      _ = return ()
--  let fn i v = putStrLn$"fn: "++show(i,v)
  
  st <- getCurrentTime      
  evaluate$ rnf m0 
  en <- getCurrentTime
  performGC
  s1 <- getGCStats  
  putStrLn$"Constructed map in "++show (diffUTCTime en st)++"\n "++ show s1++"\n"

  ------------------------------------------------------------
  -- Regular traverseWithKey uses 48MB
  -- traverseWithKey_ uses 200K of allocation:
  st <- getCurrentTime 
  M.traverseWithKey_ fn m0
  en <- getCurrentTime
  performGC
  s2 <- getGCStats 
  putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s2++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s2 - bytesAllocated s1)

  ------------------------------------------------------------
  -- foldrWithKey uses 32MB allocation:
  st <- getCurrentTime  
  M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0
  en <- getCurrentTime
  performGC
  s3 <- getGCStats 
  putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s3++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s3 - bytesAllocated s2)

  ------------------------------------------------------------
  -- An alternate version was proposed by Shachaf Ben-Kiki:
  st <- getCurrentTime
  traverseWithKey_ fn m0  
  en <- getCurrentTime
  performGC
  s4 <- getGCStats 
  putStrLn$"[alternate traverseWithKey_] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s4++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s4 - bytesAllocated s3)
  return ()


foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r
foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))

-- Since the Applicative used is Const (newtype Const m a = Const m), the
-- structure is never built up.
--(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows:

newtype Traverse_ f = Traverse_ { runTraverse_ :: f () }

instance Applicative f => Monoid (Traverse_ f) where
  mempty = Traverse_ (pure ())
  Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b)

traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f ()
traverseWithKey_ f = runTraverse_ .
                     foldMapWithKey (\k x -> Traverse_ (void (f k x)))




On Mon, Jul 29, 2013 at 3:08 PM, Shachaf Ben-Kiki <[hidden email]> wrote:
On Mon, Jul 29, 2013 at 11:47 AM, Ryan Newton <[hidden email]> wrote:
> Shachaf, I checked and Milan's commits that improve traverseWithKey were
> already incorporated when I ran my tests above.  The extra speedup is good
> but doesn't change the O(1) vs. O(N) allocation situation.

Wait, are you saying you couldn't get it to work in constant memory at
all without modifying containers? I didn't actually look at the
benchmarks in much detail before. I just looked at the code -- it
looks like in your latest posted benchmark you don't actually use the
alternate traverseWithKey_ anywhere -- instead you use foldrWithKey
twice (the perils of not compiling with -Wall!). I just ran the
benchmark with the alternate version and it looks like it uses
constant memory.

Maybe I'm misunderstanding. I'm not against adding this to containers,
but it should be clear with such a patch whether it's being added for
necessity or convenience. It looks to me like it's the latter.

(By the way: lens also provides this function, with the name
"itraverse_" (i for indexed). I tried it and it looks like it also
uses constant memory.)

    Shachaf


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Shachaf Ben-Kiki
On Mon, Jul 29, 2013 at 8:57 PM, Ryan Newton <[hidden email]> wrote:

> Oops, that was sloppy.  Yes, your version does get the job done without
> allocating.  The corrected test is attached to this email.
>
> The lens interface does look quite full featured!  And it's nice to see that
> it consistently includes '_' variants.  I cite that as additional evidence
> for the norm ;-).
>
> Personally, I still want traverseWithKey_ for convenience, especially
> because the solution you used is non-obvious.  I imagine many Data.Map users
> would not come up with it (as the rest of us on this thread didn't).
>

Sure, putting it in containers is fine. I just want to mention that
these kinds of functions are very general-purpose. In particular,
using just traverseWithKey and the proposed alterF, you can implement
at least all of:

(!) adjust adjustWithKey alter delete empty findWithDefault foldl
foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey
foldrWithKey' insert insertLookupWithKey insertWith insertWithKey
lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe
mapMaybeWithKey mapWithKey member notMember toList update
updateLookupWithKey updateWithKey

With optimal asymptotic behavior and, in many cases, with
just-about-optimal constant factors too (this is essentially what lens
does). But adding this function directly to containers for
completeness seems reasonable.

    Shachaf

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton

(!) adjust adjustWithKey alter delete empty findWithDefault foldl
foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey
foldrWithKey' insert insertLookupWithKey insertWith insertWithKey
lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe
mapMaybeWithKey mapWithKey member notMember toList update
updateLookupWithKey updateWithKey

So then this becomes a recommendation for the implementation strategy then right? (e.g. within containers)  Users probably still want a rich set of variants, and they need to be packaged somewhere.  But it seems like there's an argument that deriving these from a small core helps to ensure that there are no bugs.

Yet that sounds like a battle for another day.  On the topic of this proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind merging this patch?

  -Ryan




_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
Hi Ryan,

> -----Original message-----
> From: Ryan Newton <[hidden email]>
> Sent: 30 Jul 2013, 09:35
>
> > (!) adjust adjustWithKey alter delete empty findWithDefault foldl
> > foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey
> > foldrWithKey' insert insertLookupWithKey insertWith insertWithKey
> > lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe
> > mapMaybeWithKey mapWithKey member notMember toList update
> > updateLookupWithKey updateWithKey
> >
>
> So then this becomes a recommendation for the implementation strategy then
> right? (e.g. within containers)  Users probably still want a rich set of
> variants, and they need to be packaged somewhere.  But it seems like
> there's an argument that deriving these from a small core helps to ensure
> that there are no bugs.
>
> Yet that sounds like a battle for another day.  On the topic of this
> proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind
> merging this patch?

I would like to postpone merging for a bit and spend more time
investigating the issue.

I want to investigate why exactly
  foldrWithKey (\k a b -> f k a *> b) (pure ())
does not work as I would expect it not to allocate.

I am not convinced that traverseWithKey_ is a good addition to the API
because my original thought was "this is just a fold". The API of
containers has tendency to grow, so I am careful about adding functions
that can be expressed easily using existing ones.

Of course, ability to iterate effectively over the elements of the
containers definitely IS useful, so if I am unable to make the fold work
(or GHC optimizer cannot do it at present), I will merge it.

Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton
Ok, let us know what you find out.  I totally agree that Data.Map has gotten pretty cluttered.

But, also, part of my concern is that even if the optimizer does know how to optimize that fold, how does the programmer who needs this kind of routine *know* that it will work?  (Especially since we didn't.)  I think it's a bad habit to for users to stop paying attention to the cost model entirely and instead believe that compiler magic will eliminate everything!  Yet exactly that kind of thinking seems to be encouraged by data structure libraries that create unnecessary copies frivolously (esp. with fusion frameworks that sometimes/maybe fix the problem).





On Tue, Jul 30, 2013 at 11:32 AM, Milan Straka <[hidden email]> wrote:
Hi Ryan,

> -----Original message-----
> From: Ryan Newton <[hidden email]>
> Sent: 30 Jul 2013, 09:35
>
> > (!) adjust adjustWithKey alter delete empty findWithDefault foldl
> > foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey
> > foldrWithKey' insert insertLookupWithKey insertWith insertWithKey
> > lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe
> > mapMaybeWithKey mapWithKey member notMember toList update
> > updateLookupWithKey updateWithKey
> >
>
> So then this becomes a recommendation for the implementation strategy then
> right? (e.g. within containers)  Users probably still want a rich set of
> variants, and they need to be packaged somewhere.  But it seems like
> there's an argument that deriving these from a small core helps to ensure
> that there are no bugs.
>
> Yet that sounds like a battle for another day.  On the topic of this
> proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind
> merging this patch?

I would like to postpone merging for a bit and spend more time
investigating the issue.

I want to investigate why exactly
  foldrWithKey (\k a b -> f k a *> b) (pure ())
does not work as I would expect it not to allocate.

I am not convinced that traverseWithKey_ is a good addition to the API
because my original thought was "this is just a fold". The API of
containers has tendency to grow, so I am careful about adding functions
that can be expressed easily using existing ones.

Of course, ability to iterate effectively over the elements of the
containers definitely IS useful, so if I am unable to make the fold work
(or GHC optimizer cannot do it at present), I will merge it.

Cheers,
Milan


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
> -----Original message-----
> From: Ryan Newton <[hidden email]>
> Sent: 30 Jul 2013, 12:01
>
> Ok, let us know what you find out.  I totally agree that Data.Map has
> gotten pretty cluttered.
>
> But, also, part of my concern is that even if the optimizer does know how
> to optimize that fold, how does the programmer who needs this kind of
> routine *know* that it will work?  (Especially since we didn't.)

I believe implementing traverseWithKey_ using a foldWithKey is
quite natural (you want to go over the elements and do something with
them, in this case perform some action on them). I expected it to work
and I am surprised it does not. So for me it is the other way around --
I have a function which I expected to behave nice as a fold and it does
not. There is probably a reason for that and that may cause addition
of traverseWithKey_ to the API.

> I think it's a bad habit to for users to stop paying attention to the
> cost model entirely and instead believe that compiler magic will
> eliminate everything!  Yet exactly that kind of thinking seems to be
> encouraged by data structure libraries that create unnecessary copies
> frivolously (esp. with fusion frameworks that sometimes/maybe fix the
> problem).

Yes, that is true. Nevertheless, it is not easy to think about what will
and what will not cause heap allocation in Haskell. Also, heap
allocation is quite cheap in Haskell, sometimes faster implementations
of containers methods allocate more memory (but we usually do not use
them and prefer less allocation). Wanting to avoid allocation sometimes
causes to avoid higher order functions and advanced functional
techniques, which is also not ideal.

Nevertheless, I will try to investigate.

Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton

I believe implementing traverseWithKey_ using a foldWithKey is
quite natural (you want to go over the elements and do something with
them, in this case perform some action on them). I expected it to work
and I am surprised it does not. So for me it is the other way around --
I have a function which I expected to behave nice as a fold and it does
not

Re: expectations.  You don't get a funny feeling when monadic values are used as first class rather than second class ;-)?  Whether in the accumulator of a fold, or in cases like this:

  do act <- f x
       act

My expectation was that the optimizer would have more trouble.  But maybe that's off base.

 Also, heap
allocation is quite cheap in Haskell, sometimes faster implementations
of containers methods allocate more memory (but we usually do not use
them and prefer less allocation). Wanting to avoid allocation sometimes
causes to avoid higher order functions and advanced functional
techniques, which is also not ideal.

Yes, my situation, which I agree is a minority position, is that I work on and care about parallelism.  The current state of GHC is that it does NOT have a scalable GC, and virtually every parallel program that contains "idiomatic Haskell" levels of allocation fails to scale rather quickly as you increase threads.  I.e. you quickly get to productivities of less than 50%.  (Not necessarily at 32 threads either, often you are up against this wall at 8 or less.)

So for the time being good parallel Haskell programs are low-allocating parallel Haskell programs.


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Roman Cheplyaka-2
* Ryan Newton <[hidden email]> [2013-08-01 10:58:48-0400]

> > I believe implementing traverseWithKey_ using a foldWithKey is
> > quite natural (you want to go over the elements and do something with
> > them, in this case perform some action on them). I expected it to work
> > and I am surprised it does not. So for me it is the other way around --
> > I have a function which I expected to behave nice as a fold and it does
> > not
> >
>
> Re: expectations.  You don't get a funny feeling when monadic values are
> used as first class rather than second class ;-)?  Whether in the
> accumulator of a fold, or in cases like this:
>
>   do act <- f x
>        act

I don't. I don't even believe that the compiler can spot the difference
between the two. (But maybe it's just my ignorance.)

Do you have a specific example where this is a problem?

Roman

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton

> Re: expectations.  You don't get a funny feeling when monadic values are
> used as first class rather than second class ;-)?  Whether in the
> accumulator of a fold, or in cases like this:
>
>   do act <- f x
>        act

I don't. I don't even believe that the compiler can spot the difference
between the two. (But maybe it's just my ignorance.)

Do you have a specific example where this is a problem?

Well, I just generally assumed that anywhere where inlining is foiled would, any kind of allocation-eliminating optimization is unlikely to happen.

As one example, in the monad-par scheduler(s) we do inscrutable computations to retrieve a monadic-value, and then run it.  This works fine, but it does mean that we pay the full cost of an abstract monadic action at that point (indirect jump, etc), rather than the highly discounted rate we are used to paying when we chain together monadic actions in a syntactically adjacent way within a function.

Personally, I don't understand all the GHC optimization steps well.  But I have verified certain things that it doesn't do.  For example, if I have an allocating action in the tail (only exiting case) of a recursive monadic function, it cannot do "loop peeling" to eliminate that allocation against whatever context is calling the recursive function.  For example the STG for the following leaves the allocation of the tuple and two Ints:

foo :: Int -> IO (Int,Int)
foo x | x < 10 = return (x, 2*x)
foo x = foo (x-1)

main = do
  (x,y) <- foo 1000
  print x
  print y

Which is normally fine if the trip count is high.  But consider something like compare-and-swap where there's a loop, but the expected trip count is very low.  Perhaps this is just one example of how GHC tends not to explicitly represent and optimize loops?

The Data.Map.Base.foldRWithKey function discussed in this thread is another example.  That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop:

go10_r3TD :: IO () -> Map Int Int -> IO ()
[GblId, Arity=2, Str=DmdType LS, Unf=OtherCon []] =
    sat-only \r srt:(0,*bitmap*) [z'_s3Yk ds_s3Y7]
        case ds_s3Y7 of _ {
          Bin rb_s43z kx_s3Ye x_s43A l_s3Ys r_s3Yl ->
              case kx_s3Ye of _ {
                I# x1_s3Yi ->
                    let {
                      sat_s43x :: IO ()
                      [LclId] =
                          \r srt:(0,*bitmap*) [eta_s3Ym]
                              case x1_s3Yi of _ {
                                __DEFAULT -> go10_r3TD z'_s3Yk r_s3Yl eta_s3Ym;
                                500000 ->
                                    case hPutStr2 stdout lvl_r3Ty True eta_s3Ym of _ {
                                      (#,#) ipv_s3Yq _ -> go10_r3TD z'_s3Yk r_s3Yl ipv_s3Yq;
                                    };
                              };
                    } in  go10_r3TD sat_s43x l_s3Ys;
              };
          Tip -> z'_s3Yk;
        };
SRT(go10_r3TD): [hPutStr2, stdout, lvl_r3Ty, go10_r3TD]

The specific problem in this example seems to be that -- based on a literal reading of the above -- it's creating a closure that closes over the Int#, x1_s3Yi.  Shachaf's version that uses a newtype seems to avoid this trouble by not allowing IO's (>>=) into it at all.  The traverseWithKey_ version is attached below [1], and it manages to get rid of the IO newtype in the loop and resolves to a State#.

Perhaps there is a missing optimization tweak that would help GHC get rid of the IO type in the above STG?

Cheers,
  -Ryan

[1] P.S.  Here's the non-allocating loop produced by traverseWithKey_:

a_r3X7
  :: Map Int Int -> State# RealWorld -> (# State# RealWorld, () #)
[GblId, Arity=2, Str=DmdType SL, Unf=OtherCon []] =
    sat-only \r srt:(0,*bitmap*) [ds_s41Z eta_s42c]
        case ds_s41Z of _ {
          Bin rb_s47H k_s426 v_s47I l_s42b r_s42g ->
              case k_s426 of _ {
                I# x_s429 ->
                    case x_s429 of _ {
                      __DEFAULT ->
                          case a_r3X7 l_s42b eta_s42c of _ {
                            (#,#) ipv_s42h _ -> a_r3X7 r_s42g ipv_s42h;
                          };
                      500000 ->
                          case hPutStr2 stdout lvl_r3X1 True eta_s42c of _ {
                            (#,#) ipv_s42l _ ->
                                case a_r3X7 l_s42b ipv_s42l of _ {
                                  (#,#) ipv2_s42p _ -> a_r3X7 r_s42g ipv2_s42p;
                                };
                          };
                    };
              };
          Tip -> (#,#) [eta_s42c ()];
        };
SRT(a_r3X7): [hPutStr2, stdout, lvl_r3X1, a_r3X7]



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

RE: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Simon Peyton Jones

For example the STG for the following leaves the allocation of the tuple and two Ints:

 

foo :: Int -> IO (Int,Int)

foo x | x < 10 = return (x, 2*x)

foo x = foo (x-1)

 

Fixing this involves *nested* CPR analysis, which I am working on at the moment.

 

The Data.Map.Base.foldRWithKey function discussed in this thread is another example.  That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop:

 

This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other?  Then I could investigate more easily.

 

thanks!

 

Simon

 

 

 

 

From: Libraries [mailto:[hidden email]] On Behalf Of Ryan Newton
Sent: 03 August 2013 08:45
To: Roman Cheplyaka
Cc: Haskell Libraries
Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

 

 

> Re: expectations.  You don't get a funny feeling when monadic values are
> used as first class rather than second class ;-)?  Whether in the
> accumulator of a fold, or in cases like this:
>
>   do act <- f x
>        act

I don't. I don't even believe that the compiler can spot the difference
between the two. (But maybe it's just my ignorance.)

Do you have a specific example where this is a problem?

 

Well, I just generally assumed that anywhere where inlining is foiled would, any kind of allocation-eliminating optimization is unlikely to happen.

 

As one example, in the monad-par scheduler(s) we do inscrutable computations to retrieve a monadic-value, and then run it.  This works fine, but it does mean that we pay the full cost of an abstract monadic action at that point (indirect jump, etc), rather than the highly discounted rate we are used to paying when we chain together monadic actions in a syntactically adjacent way within a function.

 

Personally, I don't understand all the GHC optimization steps well.  But I have verified certain things that it doesn't do.  For example, if I have an allocating action in the tail (only exiting case) of a recursive monadic function, it cannot do "loop peeling" to eliminate that allocation against whatever context is calling the recursive function.  For example the STG for the following leaves the allocation of the tuple and two Ints:

 

foo :: Int -> IO (Int,Int)

foo x | x < 10 = return (x, 2*x)

foo x = foo (x-1)

 

main = do

  (x,y) <- foo 1000

  print x

  print y

 

Which is normally fine if the trip count is high.  But consider something like compare-and-swap where there's a loop, but the expected trip count is very low.  Perhaps this is just one example of how GHC tends not to explicitly represent and optimize loops?

 

The Data.Map.Base.foldRWithKey function discussed in this thread is another example.  That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop:

 

go10_r3TD :: IO () -> Map Int Int -> IO ()

[GblId, Arity=2, Str=DmdType LS, Unf=OtherCon []] =

    sat-only \r srt:(0,*bitmap*) [z'_s3Yk ds_s3Y7]

        case ds_s3Y7 of _ {

          Bin rb_s43z kx_s3Ye x_s43A l_s3Ys r_s3Yl ->

              case kx_s3Ye of _ {

                I# x1_s3Yi ->

                    let {

                      sat_s43x :: IO ()

                      [LclId] =

                          \r srt:(0,*bitmap*) [eta_s3Ym]

                              case x1_s3Yi of _ {

                                __DEFAULT -> go10_r3TD z'_s3Yk r_s3Yl eta_s3Ym;

                                500000 ->

                                    case hPutStr2 stdout lvl_r3Ty True eta_s3Ym of _ {

                                      (#,#) ipv_s3Yq _ -> go10_r3TD z'_s3Yk r_s3Yl ipv_s3Yq;

                                    };

                              };

                    } in  go10_r3TD sat_s43x l_s3Ys;

              };

          Tip -> z'_s3Yk;

        };

SRT(go10_r3TD): [hPutStr2, stdout, lvl_r3Ty, go10_r3TD]

 

The specific problem in this example seems to be that -- based on a literal reading of the above -- it's creating a closure that closes over the Int#, x1_s3Yi.  Shachaf's version that uses a newtype seems to avoid this trouble by not allowing IO's (>>=) into it at all.  The traverseWithKey_ version is attached below [1], and it manages to get rid of the IO newtype in the loop and resolves to a State#.

 

Perhaps there is a missing optimization tweak that would help GHC get rid of the IO type in the above STG?

 

Cheers,

  -Ryan

 

[1] P.S.  Here's the non-allocating loop produced by traverseWithKey_:

 

a_r3X7

  :: Map Int Int -> State# RealWorld -> (# State# RealWorld, () #)

[GblId, Arity=2, Str=DmdType SL, Unf=OtherCon []] =

    sat-only \r srt:(0,*bitmap*) [ds_s41Z eta_s42c]

        case ds_s41Z of _ {

          Bin rb_s47H k_s426 v_s47I l_s42b r_s42g ->

              case k_s426 of _ {

                I# x_s429 ->

                    case x_s429 of _ {

                      __DEFAULT ->

                          case a_r3X7 l_s42b eta_s42c of _ {

                            (#,#) ipv_s42h _ -> a_r3X7 r_s42g ipv_s42h;

                          };

                      500000 ->

                          case hPutStr2 stdout lvl_r3X1 True eta_s42c of _ {

                            (#,#) ipv_s42l _ ->

                                case a_r3X7 l_s42b ipv_s42l of _ {

                                  (#,#) ipv2_s42p _ -> a_r3X7 r_s42g ipv2_s42p;

                                };

                          };

                    };

              };

          Tip -> (#,#) [eta_s42c ()];

        };

SRT(a_r3X7): [hPutStr2, stdout, lvl_r3X1, a_r3X7]

 

 


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
Hi Ryan and Simon,

> From: Libraries [mailto:[hidden email]] On Behalf Of Ryan Newton
> Sent: 03 August 2013 08:45
> To: Roman Cheplyaka
> Cc: Haskell Libraries
> Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
>
> The Data.Map.Base.foldRWithKey function discussed in this thread is
> another example.  That's a place where even after it inlines the
> provided function into the fold, we end up with the below STG with an
> allocation of an IO () function inside the inner loop:

I have been thinking about it and I think GHC optimizer is not to blame
in this case. The two methods we are interested in are:

  traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t ()
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r

  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where go z' Tip = z'
          go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

and we call them like this:
  let actionTraverse _k 1000000 = putStrLn "Hit 1000000"
      actionTraverse _k _v = return ()
  in traverseWithKey_ actionTraverse map

and this:
  let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z
      actionFoldr _k _v z = z
  in foldrWithKey actionFoldr (pure ()) map


So although both traverseWithKey_ and foldrWithKey build an action, only
traverseWithKey_ can execute the action 'on-the-fly'. When foldrWithKey
is calling itself recursively, it has to allocate thunk with the IO action
which is to be performed after the subtree is processed. If a strict
fold is used, the memory allocated is lower (it allocated only the
resulting IO action itself), but still linear in the size of the map.

If the GHC optimizer were to avoid allocating, it would have to rewrite
the 'foldrWithKey actionFoldr' by pushing the *> up to the definition of
the recursive case of foldrWithKey, which seems extremely complicated to
perform automatically (it would have to add several 'pure ()' to the code).
But maybe I am not seeing some other way around it.



Nevertheless, I am not sure how to proceed. The problem is that
Data.Foldable offers many non-overloadable methods that are provided
through the Foldable instance, notably
  traverse_
  foldrM
  foldlM
which could be implemented so that they iterate over all elements
without allocating, but the Data.Foldable implementation allocates and
cannot be overloaded. Therefore, if we add traverseWithKey_, we still
will not have means of iterating an action over Set and IntSet elements,
and although Map.traverseWithKey_ will not allocate, using traverse_ on
Map will.

Also I am not happy that traverseWithKey_ does not specify order in
which it processed the elements. That is not such a big issue for
traverseWithKey, because the map is reassembled afterwards, but
traverseWithKey_ is only performing the action (without constructing the
map) and it does so in unspecified order.

What I am thinking at the moment about is foldlM and foldrM -- I checked
that they can be implemented not to allocate and they specify evaluation
order. But they require a Monad, not only Applicative, and would clash
with Data.Foldable.fold[lr]M. Maybe we will end up adding both
traverseWithKey_ and fold[lr]M? But than again, API growth, API growth :)

Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Sjoerd Visscher-2

On Aug 4, 2013, at 9:49 AM, Milan Straka <[hidden email]> wrote:

> The problem is that
> Data.Foldable offers many non-overloadable methods that are provided
> through the Foldable instance, notably
> traverse_
> foldrM
> foldlM
> which could be implemented so that they iterate over all elements
> without allocating, but the Data.Foldable implementation allocates and
> cannot be overloaded.

Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!

greetings,
Sjoerd
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton

On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones <[hidden email]> wrote:

> Fixing this involves *nested* CPR analysis, which I am working on at the moment.

Sounds neat!  Is nested CPR analysis on a branch?

This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other?  Then I could investigate more easily. 

Milan pasted the relevant code (thanks) for traverseWithKey_ vs. foldWithKey which I reattach at the bottom of this email.

On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher <[hidden email]> wrote:
Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!

That sounds good to me!  And straightforward.  Any objections?

Milan, why does it bother you that there is no specified order?  I'm perfectly happy as long as its deterministic on a particular machine.  (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.)  Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified?  Further, other languages (like Scheme) have maps/folds that do not specify order of side effects.

  -Ryan 

P.S.: Relevant code:

traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t ()
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r

  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where go z' Tip = z'
          go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

and we call them like this:
  let actionTraverse _k 1000000 = putStrLn "Hit 1000000"
      actionTraverse _k _v = return ()
  in traverseWithKey_ actionTraverse map

and this:
  let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z
      actionFoldr _k _v z = z
  in foldrWithKey actionFoldr (pure ()) map


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
Hi Ryan,

> -----Original message-----
> From: Ryan Newton <[hidden email]>
> Sent: 6 Aug 2013, 09:51
>
> Milan, why does it bother you that there is no specified order?  I'm
> perfectly happy as long as its deterministic on a particular machine.

it seems to me that having specified order (e.g., ascending) is more
useful, if it can be provided. If I use traverseWithKey_ to process the
elements (e.g., write them to a file, add them to other structure, send
across the network), it seems to me that process them in ascending order
is more beneficial that iterating over them in some order specified by
the shape of the structure.

I am not saying that having an operation without defined order is
useless, only that operation with ascending order seems more useful to
me.

> Aren't we already used to using Data.Map.toList and Data.Set.toList
> where order is not specified?

Well, we also have toAscList and toDescList. Moreover, toList is actually
toAscList, so I wonder how many libraries would be affected if toList
started returning the elements in some random order :)



My point is this -- if we think having an Applicative / Monadic
iteration which does not allocate is useful, I believe we should also
provide such an iteration in ascending order.


Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Edward Kmett-2
In reply to this post by Ryan Newton
traverse and traverse_ visit elements in the same order as foldMap and foldr up to the Monoid/Applicative laws permitting (finite) reassociation and unit mapping.

It would stand to reason that traverseWithKey, traverseWithKey_, foldMapWithKey and foldrWithKey should retain that relationship.

I don't particularly care about what the order is, but that said, if you start breaking the invariant of the current ordering, you'll silently break a lot of people's code while still permitting it to typecheck.

This means that the errors will be insidious and difficult to find. e.g. the lens-based zipper code for walking into a Map will silently break and I'm sure I'm not the only one who has taken advantage of the existing ordering on these combinators.

-Edward

On Tue, Aug 6, 2013 at 9:51 AM, Ryan Newton <[hidden email]> wrote:

On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones <[hidden email]> wrote:

> Fixing this involves *nested* CPR analysis, which I am working on at the moment.

Sounds neat!  Is nested CPR analysis on a branch?

This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other?  Then I could investigate more easily. 

Milan pasted the relevant code (thanks) for traverseWithKey_ vs. foldWithKey which I reattach at the bottom of this email.

On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher <[hidden email]> wrote:
Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!

That sounds good to me!  And straightforward.  Any objections?

Milan, why does it bother you that there is no specified order?  I'm perfectly happy as long as its deterministic on a particular machine.  (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.)  Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified?  Further, other languages (like Scheme) have maps/folds that do not specify order of side effects.

  -Ryan 

P.S.: Relevant code:

traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t ()
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r

  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where go z' Tip = z'
          go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

and we call them like this:
  let actionTraverse _k 1000000 = putStrLn "Hit 1000000"
      actionTraverse _k _v = return ()
  in traverseWithKey_ actionTraverse map

and this:
  let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z
      actionFoldr _k _v z = z
  in foldrWithKey actionFoldr (pure ()) map


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka
Hi Edward,

> -----Original message-----
> From: Edward Kmett <[hidden email]>
> Sent: 6 Aug 2013, 13:38
>
> traverse and traverse_ visit elements in the same order as foldMap and
> foldr up to the Monoid/Applicative laws permitting (finite) reassociation
> and unit mapping.
>
> It would stand to reason that traverseWithKey, traverseWithKey_,
> foldMapWithKey and foldrWithKey should retain that relationship.
>
> I don't particularly care about what the order is, but that said, if you
> start breaking the invariant of the current ordering, you'll silently break
> a lot of people's code while still permitting it to typecheck.
>
> This means that the errors will be insidious and difficult to find. e.g.
> the lens-based zipper code for walking into a Map will silently break and
> I'm sure I'm not the only one who has taken advantage of the existing
> ordering on these combinators.

I am not suggesting we should change the behaviour of existing functions
and traverseWithKey_ should definitely use the same order as
traverseWithKey. Changing semantics without changing type signatures is
really suspicious and usually plainly wrong.

Nevertheless, I was wondering whether we should have a monadic fold
(foldrM and foldlM) which would process the elements in a given order
(ascending and descending, analogously to foldr and foldl). From one
point of view, we can implement foldrM and foldlM using foldr and foldl,
nevertheless using linear heap space complexity compared to constant
heap space complexity we can achieve with specialized implementations.
This is the same situation as traverseWithKey_ -- we can implement it
using traverseWithKey, but the heap space complexity increases.

Cheers,
Milan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Edward Kmett-2
On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka <[hidden email]> wrote:
Hi Edward,I am not suggesting we should change the behaviour of existing functions
and traverseWithKey_ should definitely use the same order as
traverseWithKey. Changing semantics without changing type signatures is
really suspicious and usually plainly wrong.

I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r

... which visits the key/value pairs out of order unlike, say:

  go (Bin _ k v l r = go l *> f k v *> go r
 
Nevertheless, I was wondering whether we should have a monadic fold
(foldrM and foldlM) which would process the elements in a given order
(ascending and descending, analogously to foldr and foldl). From one
point of view, we can implement foldrM and foldlM using foldr and foldl,

Sure, foldrM is typically implemented in terms of foldl and foldlM is typically implemented in terms of foldr. 

Do the usual definitions like that leak on a Map?
foldrM :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m b
foldrM f z0 xs = foldl f' return xs z0
  where f' k x z = f x z >>= k
foldlM :: (Foldable t, Monad m) => (a -> b -> m a) -> a -> t b -> m a
foldlM f z0 xs = foldr f' return xs z0
  where f' x k z = f z x >>= k
nevertheless using linear heap space complexity compared to constant
heap space complexity we can achieve with specialized implementations.
This is the same situation as traverseWithKey_ -- we can implement it
using traverseWithKey, but the heap space complexity increases.

traverseWithKey_ would normally be implemented with an appropriate newtype and foldMapWithKey, rather than traverseWithKey. Does that also leak?

-Edward

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
123