ByteStrings and the ram barrier

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

ByteStrings and the ram barrier

Donald Bruce Stewart

Hey libraries@,

Data.ByteStrings are strict, unboxed arrays of Word8s. They're great for
fast processing of strings that fit in memory but no good for lazy
processing, or dealing with data larger than your ram (i.e. they're
infeasible above 1G on a 2G ram box).

Last week Duncan Coutts had the bright idea to write
Data.ByteString.Lazy, a newtyped [ByteString], which would allow
constant space processing of chunks of strict ByteString arrays.

The theory is that we'd be able to efficiently process large data, where
both ByteString and [Char] fails, and open a new range of applications
that could be handled successfully with Haskell.

By using list operations over the spine of the structure, and fast
ByteString operations over the leaves, we should be able to get some of
the benefits of both lazy lists and strict bytestrings.

For example, to map over a lazy bytestring you do a list map along the
spine of the list, and a bytestring map over the nodes:

    lazy bytestring map  = L.map (B.map f) xs

So we hacked away at it for a few days, and here are some preliminary results:
Simple task, filter the letter 'e' from a file.

------------------------------------------------------------------------

4G file, 1G ram, Linux/x86, 3.20GHz P4, GHC 6.4.1

    Data.ByteString.Lazy (128k chunks)
        constant heap, total time 2m 20s, 25% cpu

    Data.List
        constant heap, total time 8 minutes (quite good!), 98% cpu

    sed 's/e//g'
        constant heap,  I gave up after 10 minutes

------------------------------------------------------------------------

10G file, 1G ram, OpenBSD/x86, 2.4GHz P4, GHC 6.4.1

    Data.ByteString.Lazy (5M chunks)
        constant heap, total time 11.44 minutes, 34% cpu

    Data.List  
        constant heap, total time 18.21 minutes, 90% cpu

------------------------------------------------------------------------

A nice improvement for Haskell on mutli-gigabyte files, I think!

Data.ByteString.Lazy is in fps head, here:

    http://www.cse.unsw.edu.au/~dons/code/fps/Data/ByteString/Lazy.hs   

Cheers,
  Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Bulat Ziganshin-2
Hello Donald,

Friday, May 12, 2006, 10:07:47 AM, you wrote:
> Last week Duncan Coutts had the bright idea to write
> Data.ByteString.Lazy, a newtyped [ByteString], which would allow
> constant space processing of chunks of strict ByteString arrays.

i have somewhat similar idea - implement getContents variany that
returns list of LINES in file, represented as [ByteString]

>     Data.ByteString.Lazy (5M chunks)
>         constant heap, total time 11.44 minutes, 34% cpu

try to use smaller chunks, don't forget that P4 has 512-1024k cache


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Donald Bruce Stewart
bulat.ziganshin:

> Hello Donald,
>
> Friday, May 12, 2006, 10:07:47 AM, you wrote:
> > Last week Duncan Coutts had the bright idea to write
> > Data.ByteString.Lazy, a newtyped [ByteString], which would allow
> > constant space processing of chunks of strict ByteString arrays.
>
> i have somewhat similar idea - implement getContents variany that
> returns list of LINES in file, represented as [ByteString]
>
> >     Data.ByteString.Lazy (5M chunks)
> >         constant heap, total time 11.44 minutes, 34% cpu
>
> try to use smaller chunks, don't forget that P4 has 512-1024k cache

In fact, here's a graph of time versus chunk size, for my Pentium M
laptop, with a 256k L2 cache:

    http://www.cse.unsw.edu.au/~dons/tmp/chunksize_v_cache.png

Lesson: chunksize should be between 0.5 and 1x the L2 cache, and things
get much worse as soon as you go beyond your cache size.

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Donald Bruce Stewart
In reply to this post by Bulat Ziganshin-2
bulat.ziganshin:

> Hello Donald,
>
> Friday, May 12, 2006, 10:07:47 AM, you wrote:
> > Last week Duncan Coutts had the bright idea to write
> > Data.ByteString.Lazy, a newtyped [ByteString], which would allow
> > constant space processing of chunks of strict ByteString arrays.
>
> i have somewhat similar idea - implement getContents variany that
> returns list of LINES in file, represented as [ByteString]
>
> >     Data.ByteString.Lazy (5M chunks)
> >         constant heap, total time 11.44 minutes, 34% cpu
>
> try to use smaller chunks, don't forget that P4 has 512-1024k cache

After tuning the chunk size to the cache size, the filter 'e' test now
runs in:
    4.52 minutes, 53% cpu.

Which I think is pretty spectacular for 10 gigabytes of data.

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Donald Bruce Stewart
kahl:
>  >
>  > After tuning the chunk size to the cache size, the filter 'e' test now
>  > runs in:
>  >     4.52 minutes, 53% cpu.
>
> With respect to the original comparison,
> Ryan Lortie pointed out that ``sed'' is an unfair comparison,
> and ``tr'' would be more appropriate.

Hi!

Good idea, I always forget about tr. Here's the result:

 manzano$ time tr -d e < /home/dons/data/10G > /dev/null
 tr -d e < /home/dons/data/10G > /dev/null  26% cpu 4:34.55 total

I'm happy with this, within 6% of C, for a one liner. The Haskell program is:

> import System.IO
> import Data.Char
> import qualified Data.ByteString.Lazy as L
>
> main = L.hGetContents stdin >>= L.hPut stdout .  L.filterNotByte e
>    where e = fromIntegral . ord $ 'e'

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Tomasz Zielonka
In reply to this post by Donald Bruce Stewart
On Fri, May 12, 2006 at 04:07:47PM +1000, Donald Bruce Stewart wrote:
> The theory is that we'd be able to efficiently process large data, where
> both ByteString and [Char] fails, and open a new range of applications
> that could be handled successfully with Haskell.

My large data files are already divided into reasonably sized chunks
and I think this approach is quite widespread - at least Google also
processes much of their data in chunks.

To process my data with Haskell, I would have to be able to decode it
into records with efficiency close to what I achieve in C++ (say, at
least 80% as fast). Until now I managed to get 33% of it in reasonably
pure Haskell, which is not that bad IMO. However, I feel that I've hit a
barrier now and will have to use raw Ptr's or FFI. Maybe I could try
pushing it through the haskell-cafe optimisation ;-)

Anyway, the point is that large data tends to be divided into smaller
chunks not only because it's impossible to load the whole file into
memory, but also to allow random access, to help distributing the
computation over many computers, etc. So, I am not sure Haskell would
gain that much by being able to process terabytes of data in one go.

On the other hand, this is quite cool and I am probably wrong, being
concentrated on my needs.

Best regards
Tomasz
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: ByteStrings and the ram barrier

Tomasz Zielonka
On Tue, May 16, 2006 at 11:10:04AM +0200, Tomasz Zielonka wrote:
> To process my data with Haskell, I would have to be able to decode it
> into records with efficiency close to what I achieve in C++ (say, at
> least 80% as fast). Until now I managed to get 33% of it in reasonably
> pure Haskell, which is not that bad IMO.

I forgot to add that I use Data.ByteString and mostly checked
head/tail/drop/take operations. I am already amazed that using those
high-level operations gives me such a good performance.  I mean, I know
what tricks you use to make it possible, but anyway it's remarkable...

But, like I said before, I will probably have to juggle Ptr's to get
through this 33% barrier. But I will surely still use ByteString, at
least for reading data from files and to keep the decoded strings.

BTW, would you recommend ByteString for keeping thousands of small
decoded strings? Their size will be around 50-100 characters (they
are mostly URIs) and their expected life time will be quite short
(I expect to hold around 10-100 such strings in memory 97% of the time,
and up to 1000000 strings in some rare cases).

Thanks for working on ByteString!

Best regards
Tomasz
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries