A better way?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

A better way?

Keith Sheppard
Hello,

I'm new to haskell and still getting used to working with lazy
evaluation. I created a little function to calculate column widths for
a 2D list of strings (a table) by iterating through the list and
accumulating a max value for each column. My initial implementation
ran out memory for tables with many rows because of lazy evaluation
and here is how I dealt with it:

{- |
for a table, calculate the max width in characters for each column
-}
maxTableColumnWidths :: [[String]] -> [Int]
maxTableColumnWidths [] = []
maxTableColumnWidths table =
    maxTableColumnWidthsInternal table []

maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int]
maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues
maxTableColumnWidthsInternal (row:tableTail) prevMaxValues
    | seqList prevMaxValues = undefined
    | otherwise = maxTableColumnWidthsInternal tableTail
(maxRowFieldWidths row prevMaxValues)

-- this little function is for making the list strict... otherwise
-- we run out of memory
seqList [] = False
seqList (head:tail)
    | head `seq` False = undefined
    | otherwise = seqList tail

maxRowFieldWidths :: [String] -> [Int] -> [Int]
maxRowFieldWidths row prevMaxValues =
    let colLengths = map length row
        lengthOfRow = length row
        lengthOfPrevMax = length prevMaxValues
        maxPrefixList = zipWith max colLengths prevMaxValues
    in
        if lengthOfRow == lengthOfPrevMax then
            maxPrefixList
        else if lengthOfRow > lengthOfPrevMax then
            maxPrefixList ++ (drop lengthOfPrevMax colLengths)
        else
            maxPrefixList ++ (drop lengthOfRow prevMaxValues)


This works but it isn't very pretty (maybe also inefficient?). Is
there a better way to deal with this kind of memory issue?

Thanks!
Keith
Reply | Threaded
Open this post in threaded view
|

A better way?

Thomas Davie
How about this:

maxTableColumnWidths :: [[String]] -> [Int]
maxTableColumnWidths = map (maximum . map length)

Bob


On 21 Feb 2009, at 22:35, Keith Sheppard wrote:

> Hello,
>
> I'm new to haskell and still getting used to working with lazy
> evaluation. I created a little function to calculate column widths for
> a 2D list of strings (a table) by iterating through the list and
> accumulating a max value for each column. My initial implementation
> ran out memory for tables with many rows because of lazy evaluation
> and here is how I dealt with it:
>
> {- |
> for a table, calculate the max width in characters for each column
> -}
> maxTableColumnWidths :: [[String]] -> [Int]
> maxTableColumnWidths [] = []
> maxTableColumnWidths table =
>    maxTableColumnWidthsInternal table []
>
> maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int]
> maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues
> maxTableColumnWidthsInternal (row:tableTail) prevMaxValues
>    | seqList prevMaxValues = undefined
>    | otherwise = maxTableColumnWidthsInternal tableTail
> (maxRowFieldWidths row prevMaxValues)
>
> -- this little function is for making the list strict... otherwise
> -- we run out of memory
> seqList [] = False
> seqList (head:tail)
>    | head `seq` False = undefined
>    | otherwise = seqList tail
>
> maxRowFieldWidths :: [String] -> [Int] -> [Int]
> maxRowFieldWidths row prevMaxValues =
>    let colLengths = map length row
>        lengthOfRow = length row
>        lengthOfPrevMax = length prevMaxValues
>        maxPrefixList = zipWith max colLengths prevMaxValues
>    in
>        if lengthOfRow == lengthOfPrevMax then
>            maxPrefixList
>        else if lengthOfRow > lengthOfPrevMax then
>            maxPrefixList ++ (drop lengthOfPrevMax colLengths)
>        else
>            maxPrefixList ++ (drop lengthOfRow prevMaxValues)
>
>
> This works but it isn't very pretty (maybe also inefficient?). Is
> there a better way to deal with this kind of memory issue?
>
> Thanks!
> Keith
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners

Reply | Threaded
Open this post in threaded view
|

A better way?

Keith Sheppard
Thank you for the reply, but it looks like this is a different
function. Here is the interactive output of my function vs. the one
you give (my code is in Table/IO.hs):

Prelude> :load Table.IO
*Table.IO> maxTableColumnWidths  (replicate 10 ["hello", "world"])
[5,5]
*Table.IO> let maxTableColumnWidths2 = map (maximum . map length)
*Table.IO> maxTableColumnWidths2  (replicate 10 ["hello", "world"])
[5,5,5,5,5,5,5,5,5,5]

-Keith

On Sat, Feb 21, 2009 at 5:29 PM, Thomas Davie <[hidden email]> wrote:

> How about this:
>
> maxTableColumnWidths :: [[String]] -> [Int]
> maxTableColumnWidths = map (maximum . map length)
>
> Bob
>
>
> On 21 Feb 2009, at 22:35, Keith Sheppard wrote:
>
>> Hello,
>>
>> I'm new to haskell and still getting used to working with lazy
>> evaluation. I created a little function to calculate column widths for
>> a 2D list of strings (a table) by iterating through the list and
>> accumulating a max value for each column. My initial implementation
>> ran out memory for tables with many rows because of lazy evaluation
>> and here is how I dealt with it:
>>
>> {- |
>> for a table, calculate the max width in characters for each column
>> -}
>> maxTableColumnWidths :: [[String]] -> [Int]
>> maxTableColumnWidths [] = []
>> maxTableColumnWidths table =
>>   maxTableColumnWidthsInternal table []
>>
>> maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int]
>> maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues
>> maxTableColumnWidthsInternal (row:tableTail) prevMaxValues
>>   | seqList prevMaxValues = undefined
>>   | otherwise = maxTableColumnWidthsInternal tableTail
>> (maxRowFieldWidths row prevMaxValues)
>>
>> -- this little function is for making the list strict... otherwise
>> -- we run out of memory
>> seqList [] = False
>> seqList (head:tail)
>>   | head `seq` False = undefined
>>   | otherwise = seqList tail
>>
>> maxRowFieldWidths :: [String] -> [Int] -> [Int]
>> maxRowFieldWidths row prevMaxValues =
>>   let colLengths = map length row
>>       lengthOfRow = length row
>>       lengthOfPrevMax = length prevMaxValues
>>       maxPrefixList = zipWith max colLengths prevMaxValues
>>   in
>>       if lengthOfRow == lengthOfPrevMax then
>>           maxPrefixList
>>       else if lengthOfRow > lengthOfPrevMax then
>>           maxPrefixList ++ (drop lengthOfPrevMax colLengths)
>>       else
>>           maxPrefixList ++ (drop lengthOfRow prevMaxValues)
>>
>>
>> This works but it isn't very pretty (maybe also inefficient?). Is
>> there a better way to deal with this kind of memory issue?
>>
>> Thanks!
>> Keith
>> _______________________________________________
>> Beginners mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/beginners
>
>
Reply | Threaded
Open this post in threaded view
|

A better way?

Patrick LeBoutillier
In reply to this post by Keith Sheppard
Hi,

I came up with this, which seems to work well if all your rows have
the same number of fields:

maxRowFieldWidths :: [[String]] -> [Int]
maxRowFieldWidths rows = foldr rowMax zeros rows
  where rowMax fields maxes = zipWith (\f m -> max (length f) m) fields maxes
        zeros = 0 : zeros

foldr is the key here, which allows you to "accumulate" the results of
computations. In this case the accumulator is the list of all the
maximums found so far. The initial value for the accumulator is an
(infinite) list made up on only zeros. Haskell will generate only as
much as it needs.


Patrick



On Sat, Feb 21, 2009 at 4:35 PM, Keith Sheppard <[hidden email]> wrote:

> Hello,
>
> I'm new to haskell and still getting used to working with lazy
> evaluation. I created a little function to calculate column widths for
> a 2D list of strings (a table) by iterating through the list and
> accumulating a max value for each column. My initial implementation
> ran out memory for tables with many rows because of lazy evaluation
> and here is how I dealt with it:
>
> {- |
> for a table, calculate the max width in characters for each column
> -}
> maxTableColumnWidths :: [[String]] -> [Int]
> maxTableColumnWidths [] = []
> maxTableColumnWidths table =
>    maxTableColumnWidthsInternal table []
>
> maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int]
> maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues
> maxTableColumnWidthsInternal (row:tableTail) prevMaxValues
>    | seqList prevMaxValues = undefined
>    | otherwise = maxTableColumnWidthsInternal tableTail
> (maxRowFieldWidths row prevMaxValues)
>
> -- this little function is for making the list strict... otherwise
> -- we run out of memory
> seqList [] = False
> seqList (head:tail)
>    | head `seq` False = undefined
>    | otherwise = seqList tail
>
> maxRowFieldWidths :: [String] -> [Int] -> [Int]
> maxRowFieldWidths row prevMaxValues =
>    let colLengths = map length row
>        lengthOfRow = length row
>        lengthOfPrevMax = length prevMaxValues
>        maxPrefixList = zipWith max colLengths prevMaxValues
>    in
>        if lengthOfRow == lengthOfPrevMax then
>            maxPrefixList
>        else if lengthOfRow > lengthOfPrevMax then
>            maxPrefixList ++ (drop lengthOfPrevMax colLengths)
>        else
>            maxPrefixList ++ (drop lengthOfRow prevMaxValues)
>
>
> This works but it isn't very pretty (maybe also inefficient?). Is
> there a better way to deal with this kind of memory issue?
>
> Thanks!
> Keith
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners
>



--
=====================
Patrick LeBoutillier
Rosem?re, Qu?bec, Canada
Reply | Threaded
Open this post in threaded view
|

A better way?

Daniel Fischer-4
In reply to this post by Thomas Davie
Am Samstag, 21. Februar 2009 23:29 schrieb Thomas Davie:
> How about this:
>
> maxTableColumnWidths :: [[String]] -> [Int]
> maxTableColumnWidths = map (maximum . map length)

That's not what he needs,

maxTableColumnWidths = map (maximum . map length) . transpose

would be it. But I'm afraid that wouldn't solve his memory problems.

Regarding the memory problems: String is rather a memory hog anyway.
Keith, have you considered using ByteStrings? That might solve your memory
problems with a straightforward algorithm.

Also, if the rows can contain many columns, it is wasteful to calculate the
length of prevMaxValues for every row. You could either have that as a
parameter, or use a custom zipWith:

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
zipWithD _ [] ys = ys
zipWithD _ xs [] = xs

Then maxRowFieldWidths would become

maxRowFieldWidths row prev = zipWithD max (map length row) prev

or, pointfree:

maxRowFieldWidths = zipWithD max . map length

seqList can also be written as

seqList = foldr seq False

That would make

maxTableColumnWidths = foldr ((seqList .) . zipWithD max) []

I'm not sure if that is strict enough, though.

Also, you might try

maxTCWs = foldr seq [] . foldr (zipWithD max) []

, but I expect that to be a bad memory citizen.

>
> Bob
>

Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

A better way?

Daniel Fischer-4
Am Samstag, 21. Februar 2009 23:58 schrieb Daniel Fischer:

> Am Samstag, 21. Februar 2009 23:29 schrieb Thomas Davie:
> > How about this:
> >
> > maxTableColumnWidths :: [[String]] -> [Int]
> > maxTableColumnWidths = map (maximum . map length)
>
> That's not what he needs,
>
> maxTableColumnWidths = map (maximum . map length) . transpose
>
> would be it. But I'm afraid that wouldn't solve his memory problems.
>
> Regarding the memory problems: String is rather a memory hog anyway.
> Keith, have you considered using ByteStrings? That might solve your memory
> problems with a straightforward algorithm.
>
> Also, if the rows can contain many columns, it is wasteful to calculate the
> length of prevMaxValues for every row. You could either have that as a
> parameter, or use a custom zipWith:
>
> zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
> zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
> zipWithD _ [] ys = ys
> zipWithD _ xs [] = xs
>
> Then maxRowFieldWidths would become
>
> maxRowFieldWidths row prev = zipWithD max (map length row) prev
>
> or, pointfree:
>
> maxRowFieldWidths = zipWithD max . map length
>
> seqList can also be written as
>
> seqList = foldr seq False
>
> That would make
>
> maxTableColumnWidths = foldr ((seqList .) . zipWithD max) []

Ouch! Of course not seqList, but an analogous function that returns the list
itself.

evalList xs
        | seqList xs = undefined
        | otherwise = xs

maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []

>
> I'm not sure if that is strict enough, though.
>
> Also, you might try
>
> maxTCWs = foldr seq [] . foldr (zipWithD max) []

Oops! No good either. I must be too tired :(

maxTCWs = evalList . foldr (zipWithD max) []
>
> , but I expect that to be a bad memory citizen.
>
> > Bob
>
> Cheers,
> Daniel
>

Reply | Threaded
Open this post in threaded view
|

A better way?

Daniel Fischer-4
Am Sonntag, 22. Februar 2009 00:16 schrieb Daniel Fischer:
>
> maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []

Oops again:

maxTableColumnWidths =
        foldr ((evalList .) . zipWithD max) [] . map (map length)

>

>
> maxTCWs = evalList . foldr (zipWithD max) []
>

maxTCWs = evalList . foldr (zipWithD max) [] . map (map length)

> > Cheers,
> > Daniel
>

Reply | Threaded
Open this post in threaded view
|

A better way?

Keith Sheppard
In reply to this post by Patrick LeBoutillier
Hello Patrick,

I renamed your function  maxTableColumnWidths to match up with my
naming. It seems to work except that it suffers from the same problem
as my 1st implementation when you feed it a table that has many rows:

*Table.IO>  maxTableColumnWidths (replicate 1000000 ["hello", "world"])
*** Exception: stack overflow

Thanks,
Keith

On Sat, Feb 21, 2009 at 5:54 PM, Patrick LeBoutillier
<[hidden email]> wrote:

> Hi,
>
> I came up with this, which seems to work well if all your rows have
> the same number of fields:
>
> maxRowFieldWidths :: [[String]] -> [Int]
> maxRowFieldWidths rows = foldr rowMax zeros rows
>  where rowMax fields maxes = zipWith (\f m -> max (length f) m) fields maxes
>        zeros = 0 : zeros
>
> foldr is the key here, which allows you to "accumulate" the results of
> computations. In this case the accumulator is the list of all the
> maximums found so far. The initial value for the accumulator is an
> (infinite) list made up on only zeros. Haskell will generate only as
> much as it needs.
>
>
> Patrick
>
>
>
> On Sat, Feb 21, 2009 at 4:35 PM, Keith Sheppard <[hidden email]> wrote:
>> Hello,
>>
>> I'm new to haskell and still getting used to working with lazy
>> evaluation. I created a little function to calculate column widths for
>> a 2D list of strings (a table) by iterating through the list and
>> accumulating a max value for each column. My initial implementation
>> ran out memory for tables with many rows because of lazy evaluation
>> and here is how I dealt with it:
>>
>> {- |
>> for a table, calculate the max width in characters for each column
>> -}
>> maxTableColumnWidths :: [[String]] -> [Int]
>> maxTableColumnWidths [] = []
>> maxTableColumnWidths table =
>>    maxTableColumnWidthsInternal table []
>>
>> maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int]
>> maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues
>> maxTableColumnWidthsInternal (row:tableTail) prevMaxValues
>>    | seqList prevMaxValues = undefined
>>    | otherwise = maxTableColumnWidthsInternal tableTail
>> (maxRowFieldWidths row prevMaxValues)
>>
>> -- this little function is for making the list strict... otherwise
>> -- we run out of memory
>> seqList [] = False
>> seqList (head:tail)
>>    | head `seq` False = undefined
>>    | otherwise = seqList tail
>>
>> maxRowFieldWidths :: [String] -> [Int] -> [Int]
>> maxRowFieldWidths row prevMaxValues =
>>    let colLengths = map length row
>>        lengthOfRow = length row
>>        lengthOfPrevMax = length prevMaxValues
>>        maxPrefixList = zipWith max colLengths prevMaxValues
>>    in
>>        if lengthOfRow == lengthOfPrevMax then
>>            maxPrefixList
>>        else if lengthOfRow > lengthOfPrevMax then
>>            maxPrefixList ++ (drop lengthOfPrevMax colLengths)
>>        else
>>            maxPrefixList ++ (drop lengthOfRow prevMaxValues)
>>
>>
>> This works but it isn't very pretty (maybe also inefficient?). Is
>> there a better way to deal with this kind of memory issue?
>>
>> Thanks!
>> Keith
>> _______________________________________________
>> Beginners mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>
>
>
> --
> =====================
> Patrick LeBoutillier
> Rosem?re, Qu?bec, Canada
>
Reply | Threaded
Open this post in threaded view
|

A better way?

Keith Sheppard
In reply to this post by Daniel Fischer-4
ghci still is not happy if we have many rows...

Prelude> :load Table.IO
[1 of 1] Compiling Table.IO         ( Table/IO.hs, interpreted )
Ok, modules loaded: Table.IO.
*Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD
max) [] . map (map length)
*Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map length)
*Table.IO> maxTableColumnWidths  (replicate 1000000 ["hello", "world"])
*** Exception: stack overflow
*Table.IO> maxTCWs  (replicate 1000000 ["hello", "world"])
*** Exception: stack overflow

I hadn't thought of using ByteStrings since I don't know what they are
:-). I'll have to look into it, but I'm assuming that ByteStrings will
give some constant time/space improvement? I think it won't help with
my first problem though since what's happening is that the lazy
function calls are piling up too deep (at least thats what I think is
happening).

Thank you
Keith

On Sat, Feb 21, 2009 at 6:21 PM, Daniel Fischer
<[hidden email]> wrote:

> Am Sonntag, 22. Februar 2009 00:16 schrieb Daniel Fischer:
>>
>> maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []
>
> Oops again:
>
> maxTableColumnWidths =
>        foldr ((evalList .) . zipWithD max) [] . map (map length)
>
>>
>
>>
>> maxTCWs = evalList . foldr (zipWithD max) []
>>
>
> maxTCWs = evalList . foldr (zipWithD max) [] . map (map length)
>
>> > Cheers,
>> > Daniel
>>
>
>
Reply | Threaded
Open this post in threaded view
|

A better way?

Daniel Fischer-4
Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:

> ghci still is not happy if we have many rows...
>
> Prelude> :load Table.IO
> [1 of 1] Compiling Table.IO         ( Table/IO.hs, interpreted )
> Ok, modules loaded: Table.IO.
> *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD
> max) [] . map (map length)
> *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map
> length) *Table.IO> maxTableColumnWidths  (replicate 1000000 ["hello",
> "world"]) *** Exception: stack overflow

Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the
end) and it works for large tables. If the rows are short, it actually is
faster (here) than your version, but if the rows are long , e.g.

maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))

your version is faster than

import Data.List (foldl')

seqList [] = False
seqList (head:tail)
    | head `seq` False = undefined
    | otherwise = seqList tail

evalList xs
    | seqList xs    = undefined
    | otherwise = xs

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
zipWithD _ [] ys = ys
zipWithD _ xs [] = xs

maxTableColumnWidths =
        foldl' ((evalList .) . zipWithD max) [] . map (map length)


> *Table.IO> maxTCWs  (replicate 1000000 ["hello", "world"])
> *** Exception: stack overflow

That doesn't surprise me in the least. One has to force the list in each step.

>
> I hadn't thought of using ByteStrings since I don't know what they are

ByteStrings are basically byte arrays. Thus if you're dealing exclusively with
characters in the range 0-255, you need only one byte per character, while
with Strings, you need four bytes for each character itself, plus several
machine words for pointers (list cell to Char, list cell to next), I don't
remember, but I think it amounts to 12 or 20 bytes per character.
>
> :-). I'll have to look into it, but I'm assuming that ByteStrings will
>
> give some constant time/space improvement?

Can be several orders of magnitude faster, and as said above, they use far
less memory (but they're useful only for long enough strings, having a
multitude of short ByteStrings floating around doesn't do any good).

> I think it won't help with
> my first problem though since what's happening is that the lazy
> function calls are piling up too deep (at least thats what I think is
> happening).

Yes, that's basically what *** Exception: stack overflow means :)

>
> Thank you
> Keith
>

Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

A better way?

Patrick LeBoutillier
Hi,

I changed my code to :

maxTableColumnWidths :: [[String]] -> [Int]
maxTableColumnWidths = foldl' rowMax zeros
  where rowMax = zipWith (\m f -> max (length f) m)
        zeros = 0 : zeros

but it still blows the stack. I don't understand why. Doesn't foldl'
force the evaluation of each call to rowMax? If so then I don't see
what causes the stack to get so big... unless I'm looking in the wrong
place...

Can anyone shed some light?


Patrick


On Sat, Feb 21, 2009 at 8:03 PM, Daniel Fischer
<[hidden email]> wrote:

> Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:
>> ghci still is not happy if we have many rows...
>>
>> Prelude> :load Table.IO
>> [1 of 1] Compiling Table.IO         ( Table/IO.hs, interpreted )
>> Ok, modules loaded: Table.IO.
>> *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD
>> max) [] . map (map length)
>> *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map
>> length) *Table.IO> maxTableColumnWidths  (replicate 1000000 ["hello",
>> "world"]) *** Exception: stack overflow
>
> Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the
> end) and it works for large tables. If the rows are short, it actually is
> faster (here) than your version, but if the rows are long , e.g.
>
> maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))
>
> your version is faster than
>
> import Data.List (foldl')
>
> seqList [] = False
> seqList (head:tail)
>    | head `seq` False = undefined
>    | otherwise = seqList tail
>
> evalList xs
>    | seqList xs    = undefined
>    | otherwise = xs
>
> zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
> zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
> zipWithD _ [] ys = ys
> zipWithD _ xs [] = xs
>
> maxTableColumnWidths =
>        foldl' ((evalList .) . zipWithD max) [] . map (map length)
>
>
>> *Table.IO> maxTCWs  (replicate 1000000 ["hello", "world"])
>> *** Exception: stack overflow
>
> That doesn't surprise me in the least. One has to force the list in each step.
>
>>
>> I hadn't thought of using ByteStrings since I don't know what they are
>
> ByteStrings are basically byte arrays. Thus if you're dealing exclusively with
> characters in the range 0-255, you need only one byte per character, while
> with Strings, you need four bytes for each character itself, plus several
> machine words for pointers (list cell to Char, list cell to next), I don't
> remember, but I think it amounts to 12 or 20 bytes per character.
>>
>> :-). I'll have to look into it, but I'm assuming that ByteStrings will
>>
>> give some constant time/space improvement?
>
> Can be several orders of magnitude faster, and as said above, they use far
> less memory (but they're useful only for long enough strings, having a
> multitude of short ByteStrings floating around doesn't do any good).
>
>> I think it won't help with
>> my first problem though since what's happening is that the lazy
>> function calls are piling up too deep (at least thats what I think is
>> happening).
>
> Yes, that's basically what *** Exception: stack overflow means :)
>
>>
>> Thank you
>> Keith
>>
>
> Cheers,
> Daniel
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners
>



--
=====================
Patrick LeBoutillier
Rosem?re, Qu?bec, Canada
Reply | Threaded
Open this post in threaded view
|

A better way?

Keith Sheppard
In reply to this post by Daniel Fischer-4
Thanks, this code looks cleaner than what I had. I like how zipWithD
fixes the whole issue of uneven rows...

-Keith


On Sat, Feb 21, 2009 at 8:03 PM, Daniel Fischer
<[hidden email]> wrote:

> Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:
>> ghci still is not happy if we have many rows...
>>
>> Prelude> :load Table.IO
>> [1 of 1] Compiling Table.IO         ( Table/IO.hs, interpreted )
>> Ok, modules loaded: Table.IO.
>> *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD
>> max) [] . map (map length)
>> *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map
>> length) *Table.IO> maxTableColumnWidths  (replicate 1000000 ["hello",
>> "world"]) *** Exception: stack overflow
>
> Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the
> end) and it works for large tables. If the rows are short, it actually is
> faster (here) than your version, but if the rows are long , e.g.
>
> maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))
>
> your version is faster than
>
> import Data.List (foldl')
>
> seqList [] = False
> seqList (head:tail)
>    | head `seq` False = undefined
>    | otherwise = seqList tail
>
> evalList xs
>    | seqList xs    = undefined
>    | otherwise = xs
>
> zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
> zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
> zipWithD _ [] ys = ys
> zipWithD _ xs [] = xs
>
> maxTableColumnWidths =
>        foldl' ((evalList .) . zipWithD max) [] . map (map length)
>
>
>> *Table.IO> maxTCWs  (replicate 1000000 ["hello", "world"])
>> *** Exception: stack overflow
>
> That doesn't surprise me in the least. One has to force the list in each step.
>
>>
>> I hadn't thought of using ByteStrings since I don't know what they are
>
> ByteStrings are basically byte arrays. Thus if you're dealing exclusively with
> characters in the range 0-255, you need only one byte per character, while
> with Strings, you need four bytes for each character itself, plus several
> machine words for pointers (list cell to Char, list cell to next), I don't
> remember, but I think it amounts to 12 or 20 bytes per character.
>>
>> :-). I'll have to look into it, but I'm assuming that ByteStrings will
>>
>> give some constant time/space improvement?
>
> Can be several orders of magnitude faster, and as said above, they use far
> less memory (but they're useful only for long enough strings, having a
> multitude of short ByteStrings floating around doesn't do any good).
>
>> I think it won't help with
>> my first problem though since what's happening is that the lazy
>> function calls are piling up too deep (at least thats what I think is
>> happening).
>
> Yes, that's basically what *** Exception: stack overflow means :)
>
>>
>> Thank you
>> Keith
>>
>
> Cheers,
> Daniel
>
>
Reply | Threaded
Open this post in threaded view
|

Re: A better way?

Heinrich Apfelmus
In reply to this post by Daniel Fischer-4
Daniel Fischer wrote:

>
> Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the
> end) and it works for large tables. If the rows are short, it actually is
> faster (here) than your version, but if the rows are long , e.g.
>
> maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))
>
> your version is faster than
>
> import Data.List (foldl')
>
> seqList [] = False
> seqList (head:tail)
>     | head `seq` False = undefined
>     | otherwise = seqList tail
>
> evalList xs
>     | seqList xs    = undefined
>     | otherwise = xs
>
> zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
> zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
> zipWithD _ [] ys = ys
> zipWithD _ xs [] = xs
>
> maxTableColumnWidths =
> foldl' ((evalList .) . zipWithD max) [] . map (map length)

Nice! I'd use bang patterns in favor of the now outdated

   | x `seq` False = undefined

pattern though.


Actually, I'd use

   import Control.Parallel.Strategies

   maxWidths = foldl' (\xs ys -> zipWithD max xs ys `using` rnf) []
             . map (map length)

The module quite useful for controlling evaluation, even when no
parallelism is involved.


Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: A better way?

Heinrich Apfelmus
In reply to this post by Patrick LeBoutillier
Patrick LeBoutillier wrote:

> Hi,
>
> I changed my code to :
>
> maxTableColumnWidths :: [[String]] -> [Int]
> maxTableColumnWidths = foldl' rowMax zeros
>   where rowMax = zipWith (\m f -> max (length f) m)
>         zeros = 0 : zeros
>
> but it still blows the stack. I don't understand why. Doesn't foldl'
> force the evaluation of each call to rowMax? If so then I don't see
> what causes the stack to get so big... unless I'm looking in the wrong
> place...
>
> Can anyone shed some light?

It does force the evaluation of each call to  rowMax  but only to weak
head normal form. In other words, it will only evaluate the list to
either a cons cell or an empty list

   blah   --->  blah : blah
           \->  []

See also

    http://en.wikibooks.org/wiki/Haskell/Graph_reduction


Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: A better way?

Daniel Fischer-4
In reply to this post by Heinrich Apfelmus
Am Sonntag, 22. Februar 2009 15:30 schrieb Heinrich Apfelmus:

> Daniel Fischer wrote:
> > Yes, foldr was the wrong choice. make it foldl' (don't forget the prime
> > at the end) and it works for large tables. If the rows are short, it
> > actually is faster (here) than your version, but if the rows are long ,
> > e.g.
> >
> > maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))
> >
> > your version is faster than
> >
> > import Data.List (foldl')
> >
> > seqList [] = False
> > seqList (head:tail)
> >
> >     | head `seq` False = undefined
> >     | otherwise = seqList tail
> >
> > evalList xs
> >
> >     | seqList xs    = undefined
> >     | otherwise = xs
> >
> > zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a]
> > zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt
> > zipWithD _ [] ys = ys
> > zipWithD _ xs [] = xs
> >
> > maxTableColumnWidths =
> > foldl' ((evalList .) . zipWithD max) [] . map (map length)
>
> Nice! I'd use bang patterns in favor of the now outdated
>
>    | x `seq` False = undefined
>
> pattern though.

I would, too, but they're not yet portable, are they?
Which implementations other than GHC currently support them?
And since seqList was already there, I used that.

>
>
> Actually, I'd use
>
>    import Control.Parallel.Strategies
>
>    maxWidths = foldl' (\xs ys -> zipWithD max xs ys `using` rnf) []
>              . map (map length)
>
> The module quite useful for controlling evaluation, even when no
> parallelism is involved.

Winner!

>
>
> Regards,
> apfelmus
>
Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

Re: A better way?

Heinrich Apfelmus
Daniel Fischer wrote:
>
>> I'd use bang patterns in favor of the now outdated
>>
>>    | x `seq` False = undefined
>>
>> pattern though.
>
> I would, too, but they're not yet portable, are they?
> Which implementations other than GHC currently support them?

No, they're not portable yet, but they're really convenient. :D  The
implementations other than GHC will probably pick up -XBangPatterns as well.


Regards,
apfelmus

--
http://apfelmus.nfshost.com