Potential space leak in transpose

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Potential space leak in transpose

David Feuer
In Data.List, we define

transpose :: [[a]] -> [[a]]
transpose [] = []
transpose ([] : xss) = transpose xss
transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs : [t | (_ : t) <- xss])

The potential difficulty is that we essentially mapMaybe over the xss list twice in the third case. So we hang on to the heads where we need the tails and the tails where we need the heads. We could fix that with something like this:

transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
  where
    (fronts, rears) = unzip [(h,t) | (h : t) <- xss]

I don't know how much that'll cost in practice.

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Andreas Abel
 > I don't know how much that'll cost in practice.

What costs are you worried about?

On 2020-05-12 00:08, David Feuer wrote:

> In Data.List, we define
>
> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose ([] : xss) = transpose xss
> transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs
> : [t | (_ : t) <- xss])
>
> The potential difficulty is that we essentially mapMaybe over the xss
> list twice in the third case. So we hang on to the heads where we need
> the tails and the tails where we need the heads. We could fix that with
> something like this:
>
> transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
>    where
>      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>
> I don't know how much that'll cost in practice.
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
The cost of allocating the extra pairs.

On Tue, May 12, 2020, 5:11 AM Andreas Abel <[hidden email]> wrote:
 > I don't know how much that'll cost in practice.

What costs are you worried about?

On 2020-05-12 00:08, David Feuer wrote:
> In Data.List, we define
>
> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose ([] : xss) = transpose xss
> transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs
> : [t | (_ : t) <- xss])
>
> The potential difficulty is that we essentially mapMaybe over the xss
> list twice in the third case. So we hang on to the heads where we need
> the tails and the tails where we need the heads. We could fix that with
> something like this:
>
> transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
>    where
>      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>
> I don't know how much that'll cost in practice.
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
Also, the more eager list allocation can increase residency, but I don't think it can cause a leak.

On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]> wrote:
The cost of allocating the extra pairs.

On Tue, May 12, 2020, 5:11 AM Andreas Abel <[hidden email]> wrote:
 > I don't know how much that'll cost in practice.

What costs are you worried about?

On 2020-05-12 00:08, David Feuer wrote:
> In Data.List, we define
>
> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose ([] : xss) = transpose xss
> transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs
> : [t | (_ : t) <- xss])
>
> The potential difficulty is that we essentially mapMaybe over the xss
> list twice in the third case. So we hang on to the heads where we need
> the tails and the tails where we need the heads. We could fix that with
> something like this:
>
> transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
>    where
>      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>
> I don't know how much that'll cost in practice.
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Ryan Reich
Why not just inline the definition of unzip and hand-optimize away the pairs? 

On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]> wrote:
Also, the more eager list allocation can increase residency, but I don't think it can cause a leak.

On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]> wrote:
The cost of allocating the extra pairs.

On Tue, May 12, 2020, 5:11 AM Andreas Abel <[hidden email]> wrote:
 > I don't know how much that'll cost in practice.

What costs are you worried about?

On 2020-05-12 00:08, David Feuer wrote:
> In Data.List, we define
>
> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose ([] : xss) = transpose xss
> transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs
> : [t | (_ : t) <- xss])
>
> The potential difficulty is that we essentially mapMaybe over the xss
> list twice in the third case. So we hang on to the heads where we need
> the tails and the tails where we need the heads. We could fix that with
> something like this:
>
> transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
>    where
>      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>
> I don't know how much that'll cost in practice.
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
unzip should fuse with its argument, so inlining presumably won't do anything there. The pairs that unzip itself creates are actually what prevent the leak. GHC creates "selector thunks" to deconstruct them, which the garbage collector can reduce. The garbage collector will never reduce the deconstruction of list conses.

On Wed, May 13, 2020, 4:27 PM Ryan Reich <[hidden email]> wrote:
Why not just inline the definition of unzip and hand-optimize away the pairs? 

On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]> wrote:
Also, the more eager list allocation can increase residency, but I don't think it can cause a leak.

On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]> wrote:
The cost of allocating the extra pairs.

On Tue, May 12, 2020, 5:11 AM Andreas Abel <[hidden email]> wrote:
 > I don't know how much that'll cost in practice.

What costs are you worried about?

On 2020-05-12 00:08, David Feuer wrote:
> In Data.List, we define
>
> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose ([] : xss) = transpose xss
> transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) : transpose (xs
> : [t | (_ : t) <- xss])
>
> The potential difficulty is that we essentially mapMaybe over the xss
> list twice in the third case. So we hang on to the heads where we need
> the tails and the tails where we need the heads. We could fix that with
> something like this:
>
> transpose ((x : xs) : xss) = (x : fronts) : transpose (xs : rears)
>    where
>      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>
> I don't know how much that'll cost in practice.
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Andreas Abel
In reply to this post by Ryan Reich
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Ryan Reich
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Ryan Reich
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Donnacha Oisín Kidney-2
How about the following variant?

    transpose = takeWhile (not . null) . foldr (foldr f id) (repeat [])
      where
        f x xs ~(y:ys) = (x:y) : xs ys

Although I don’t really understand the memory leak situation, I think you might be able to get the pair behaviour with a stream type:

    data Stream a = a :< Stream a

    streamToList :: Stream [a] -> [[a]]
    streamToList ([] :< _ ) = []
    streamToList (x  :< xs) = x : streamToList xs

    transpose = streamToList . foldr (foldr f id) (fix ([] :<))
      where
        f x xs ~(y :< ys) = (x:y) :< xs ys

And you could turn the streamToList function into a good producer like so:

    streamToList :: Stream [a] -> [[a]]
    streamToList xs =
      build (\c n -> 
        let go ([] :< _ ) = n
            go (x  :< xs) = c x (go xs)
        in go xs)

The other advantage of this of course is that transpose becomes a good consumer.

On 15 May 2020, at 01:15, David Feuer <[hidden email]> wrote:

That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Ryan Reich
In reply to this post by David Feuer
So it doesn't, but this one does (*):

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose [r] = map (:[]) r
transpose (r1 : rrest) = zip' r1 (transpose rrest)
  where
    zip' [] bs = bs
    zip' (a : as) ~(b : bs) = (a : b) : zip' as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
  print $ take 10 $ transpose [[0 ..]]
  --print $ map (take 10) $ transpose (repeat [0])
  print $ take 3 $ map (take 3) $ transpose (repeat [0..])
---

It's true that this implementation will make the commented line diverge.  However, (*) the stock implementation of transpose also diverges!  Try it: head <$> transpose (repeat [0]) in ghci will print [0 and then loop.  So I think this one is okay by that standard.  Does it do better at allocation than the existing one or the one with unzip?

<rant, possibly beside the point>
It's actually not even well-defined what the transpose would be if we allow infinitely many rows and also use the "condensing" property where the columns skip over nonexistent terms in the rows.  How would you even prove that transpose (repeat [0]) has length 1?  You'd have to rule out the existence of any second term in any of the infinitely many rows.  Of course, in this example there is a fast way to do that, but the rows could be anything, so the problem is uncomputable.  This logic is also why the final test above actually does terminate: because we find the requisite number of elements early on (in fact, instantly, since the rows are all infinite).

This actually raises the question of whether transpose even "should" care about infinitely many rows or columns (I know, the behavior is standardized; pretend we're discussing a new alternative transpose' that explicitly only does finite lists).
</rant>

Ryan

On Thu, May 14, 2020 at 5:15 PM David Feuer <[hidden email]> wrote:
That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Ryan Reich
Never mind, that one is broken.  I should probably not write these late in the evening.

On Thu, May 14, 2020 at 10:30 PM Ryan Reich <[hidden email]> wrote:
So it doesn't, but this one does (*):

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose [r] = map (:[]) r
transpose (r1 : rrest) = zip' r1 (transpose rrest)
  where
    zip' [] bs = bs
    zip' (a : as) ~(b : bs) = (a : b) : zip' as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
  print $ take 10 $ transpose [[0 ..]]
  --print $ map (take 10) $ transpose (repeat [0])
  print $ take 3 $ map (take 3) $ transpose (repeat [0..])
---

It's true that this implementation will make the commented line diverge.  However, (*) the stock implementation of transpose also diverges!  Try it: head <$> transpose (repeat [0]) in ghci will print [0 and then loop.  So I think this one is okay by that standard.  Does it do better at allocation than the existing one or the one with unzip?

<rant, possibly beside the point>
It's actually not even well-defined what the transpose would be if we allow infinitely many rows and also use the "condensing" property where the columns skip over nonexistent terms in the rows.  How would you even prove that transpose (repeat [0]) has length 1?  You'd have to rule out the existence of any second term in any of the infinitely many rows.  Of course, in this example there is a fast way to do that, but the rows could be anything, so the problem is uncomputable.  This logic is also why the final test above actually does terminate: because we find the requisite number of elements early on (in fact, instantly, since the rows are all infinite).

This actually raises the question of whether transpose even "should" care about infinitely many rows or columns (I know, the behavior is standardized; pretend we're discussing a new alternative transpose' that explicitly only does finite lists).
</rant>

Ryan

On Thu, May 14, 2020 at 5:15 PM David Feuer <[hidden email]> wrote:
That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

Carter Schonwald
David : call me unimaginative, but can you share an example that can trigger the space
Leak ? At least conceptually I’m being a tad thick about it though I think I see what you’re pointing at. 

On Fri, May 15, 2020 at 4:24 AM Ryan Reich <[hidden email]> wrote:
Never mind, that one is broken.  I should probably not write these late in the evening.

On Thu, May 14, 2020 at 10:30 PM Ryan Reich <[hidden email]> wrote:
So it doesn't, but this one does (*):

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose [r] = map (:[]) r
transpose (r1 : rrest) = zip' r1 (transpose rrest)
  where
    zip' [] bs = bs
    zip' (a : as) ~(b : bs) = (a : b) : zip' as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
  print $ take 10 $ transpose [[0 ..]]
  --print $ map (take 10) $ transpose (repeat [0])
  print $ take 3 $ map (take 3) $ transpose (repeat [0..])
---

It's true that this implementation will make the commented line diverge.  However, (*) the stock implementation of transpose also diverges!  Try it: head <$> transpose (repeat [0]) in ghci will print [0 and then loop.  So I think this one is okay by that standard.  Does it do better at allocation than the existing one or the one with unzip?

<rant, possibly beside the point>
It's actually not even well-defined what the transpose would be if we allow infinitely many rows and also use the "condensing" property where the columns skip over nonexistent terms in the rows.  How would you even prove that transpose (repeat [0]) has length 1?  You'd have to rule out the existence of any second term in any of the infinitely many rows.  Of course, in this example there is a fast way to do that, but the rows could be anything, so the problem is uncomputable.  This logic is also why the final test above actually does terminate: because we find the requisite number of elements early on (in fact, instantly, since the rows are all infinite).

This actually raises the question of whether transpose even "should" care about infinitely many rows or columns (I know, the behavior is standardized; pretend we're discussing a new alternative transpose' that explicitly only does finite lists).
</rant>

Ryan

On Thu, May 14, 2020 at 5:15 PM David Feuer <[hidden email]> wrote:
That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Potential space leak in transpose

David Feuer
Carter, imagine the list elements are thunks and that each of them, when forced, produces a large structure (e.g., a vector of length 10^7). Does that help your imagination?

On Fri, May 15, 2020, 12:41 PM Carter Schonwald <[hidden email]> wrote:
David : call me unimaginative, but can you share an example that can trigger the space
Leak ? At least conceptually I’m being a tad thick about it though I think I see what you’re pointing at. 

On Fri, May 15, 2020 at 4:24 AM Ryan Reich <[hidden email]> wrote:
Never mind, that one is broken.  I should probably not write these late in the evening.

On Thu, May 14, 2020 at 10:30 PM Ryan Reich <[hidden email]> wrote:
So it doesn't, but this one does (*):

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose [r] = map (:[]) r
transpose (r1 : rrest) = zip' r1 (transpose rrest)
  where
    zip' [] bs = bs
    zip' (a : as) ~(b : bs) = (a : b) : zip' as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
  print $ take 10 $ transpose [[0 ..]]
  --print $ map (take 10) $ transpose (repeat [0])
  print $ take 3 $ map (take 3) $ transpose (repeat [0..])
---

It's true that this implementation will make the commented line diverge.  However, (*) the stock implementation of transpose also diverges!  Try it: head <$> transpose (repeat [0]) in ghci will print [0 and then loop.  So I think this one is okay by that standard.  Does it do better at allocation than the existing one or the one with unzip?

<rant, possibly beside the point>
It's actually not even well-defined what the transpose would be if we allow infinitely many rows and also use the "condensing" property where the columns skip over nonexistent terms in the rows.  How would you even prove that transpose (repeat [0]) has length 1?  You'd have to rule out the existence of any second term in any of the infinitely many rows.  Of course, in this example there is a fast way to do that, but the rows could be anything, so the problem is uncomputable.  This logic is also why the final test above actually does terminate: because we find the requisite number of elements early on (in fact, instantly, since the rows are all infinite).

This actually raises the question of whether transpose even "should" care about infinitely many rows or columns (I know, the behavior is standardized; pretend we're discussing a new alternative transpose' that explicitly only does finite lists).
</rant>

Ryan

On Thu, May 14, 2020 at 5:15 PM David Feuer <[hidden email]> wrote:
That doesn't look like it works on infinite lists.

On Thu, May 14, 2020, 8:09 PM Ryan Reich <[hidden email]> wrote:
Hopefully I'm not fooling myself...how about this?

---
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)

zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] -> [c])
zipWith' _ _ fb [] bs = fb <$> bs
zipWith' _ fa _ as [] = fa <$> as
zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs

main = do
  mapM_ (print . transpose)
    [
      [[1,2,3]],
      [[1,2,3],[4,5,6]],
      [[1,2,3],[4,5,6],[7,8,9]],
      [[10,11],[20],[],[30,31,32]]
    ]
---

I see the output:
---
[[1],[2],[3]]
[[1,4],[2,5],[3,6]]
[[1,4,7],[2,5,8],[3,6,9]]
[[10,20,30],[11,31],[32]]
---

which is correct (the last example is the one from the haddocks).

I was concerned that the definition of zipWith' (which is akin to the Map function unionWith) is not compatible with the build/fold fusion rule, but the implementation of zipWith itself is basically the same.

Ryan

On Thu, May 14, 2020 at 1:17 PM David Feuer <[hidden email]> wrote:
Right. I still think it might be the right thing to do, though. I'm not a big fan of general-purpose library functions that have any unnecessary memory leak hazard. The biggest counterargument is that real code is unlikely to run into that problem.

On Thu, May 14, 2020, 3:35 PM Ryan Reich <[hidden email]> wrote:
My suggestion was much less sophisticated even than that, and is basically what David answered with fusion. Also according to his answer, the original code of transpose lacks the laziness that unzip's actual implementation would provide.

I think what that means is that his concern over allocating extra pairs is about the ones created internally by unzip when it builds the lazy heads-and-tails accessors.

On Thu, May 14, 2020, 03:27 Andreas Abel <[hidden email]> wrote:
On 2020-05-13 22:27, Ryan Reich wrote:
> Why not just inline the definition of unzip and hand-optimize away the
> pairs?

Isn't this what the original code of transpose is doing?

> On Tue, May 12, 2020, 10:24 David Feuer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Also, the more eager list allocation can increase residency, but I
>     don't think it can cause a leak.
>
>     On Tue, May 12, 2020, 9:48 AM David Feuer <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The cost of allocating the extra pairs.
>
>         On Tue, May 12, 2020, 5:11 AM Andreas Abel
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>               > I don't know how much that'll cost in practice.
>
>             What costs are you worried about?
>
>             On 2020-05-12 00:08, David Feuer wrote:
>              > In Data.List, we define
>              >
>              > transpose :: [[a]] -> [[a]]
>              > transpose [] = []
>              > transpose ([] : xss) = transpose xss
>              > transpose ((x : xs) : xss) = (x : [h | (h : _) <- xss]) :
>             transpose (xs
>              > : [t | (_ : t) <- xss])
>              >
>              > The potential difficulty is that we essentially mapMaybe
>             over the xss
>              > list twice in the third case. So we hang on to the heads
>             where we need
>              > the tails and the tails where we need the heads. We could
>             fix that with
>              > something like this:
>              >
>              > transpose ((x : xs) : xss) = (x : fronts) : transpose (xs
>             : rears)
>              >    where
>              >      (fronts, rears) = unzip [(h,t) | (h : t) <- xss]
>              >
>              > I don't know how much that'll cost in practice.
>              >
>              > _______________________________________________
>              > Libraries mailing list
>              > [hidden email] <mailto:[hidden email]>
>              > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>              >
>
>     _______________________________________________
>     Libraries mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries