hash of a lazy bytestring?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

hash of a lazy bytestring?

David Roundy-2
I was just contemplating hashes, and it occurred to me that it would be
nice to be able to compute the hash of a lazily constructed bytestring and
lazily consume its output without requiring the whole string to ever be in
memory.  Or in general, it'd be nice to be able to perform two simultaneous
consumptions of a lazy list without requiring that the entire list be
stored.  Another example would be computing the length of an ordinary list:

do l <- readFile "foo"
   let len = length l
   writeFile "bar" l
   putStrLn $ "Length is " ++ show l

It would be nice if one could write length such that the above function
doesn't require the entire file be stored in memory.  Are there any
approaches that would make this possible? The only approach I can imagine
would involve something like weak pointers and finalizers.  It seems like
it'd be a useful paradigm, if it's possible.  Basically to be able to write
a lazy consumer that doesn't hold onto its input, but instead is computed
as the garbage collector frees the input.  Something like:

length' :: [a] -> Integer
length' !xs = l' 0 `safeInterleave` xs
   where l' n [] = n
         l' !n (y:ys) = l' (n+1) `safeInterleave` ys

safeInterleave :: (a -> b) -> a -> b
safeInterleave f !x = -- create weak reference to x, associate finalizer,
                      -- so that if x is GCed, then we'll immediately
                      -- evaluate f x before we lose the value of x.

Is this possible? Would it be reasonable? I imagine we'd run into trouble
with laziness keeping safeInterleave from being called, with the result
that we'd hang onto x even though the safeInterleave would have hoped to
allow x to be GCed.
--
David Roundy
http://www.darcs.net
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: hash of a lazy bytestring?

David Roundy-2
On Sun, May 13, 2007 at 08:30:41AM -0700, David Roundy wrote:
> do l <- readFile "foo"
>    let len = length l
>    writeFile "bar" l
>    putStrLn $ "Length is " ++ show l

Oops, of course this should have been show len in the last line.
--
David Roundy
http://www.darcs.net
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: hash of a lazy bytestring?

Tom Schrijvers-2
In reply to this post by David Roundy-2
On Sun, 13 May 2007, David Roundy wrote:

> I was just contemplating hashes, and it occurred to me that it would be
> nice to be able to compute the hash of a lazily constructed bytestring and
> lazily consume its output without requiring the whole string to ever be in
> memory.  Or in general, it'd be nice to be able to perform two simultaneous
> consumptions of a lazy list without requiring that the entire list be
> stored.

How about reading the same file twice?

  do l1 <- readFile "foo"
            l2 <- readFile "foo"
    let len = length l1
            writeFile "bar" l2
    ...
Another solution would be loop fusion:

  do l <- readFile "foo"
    len <- writeFileAndComputeLength "bar" l
    ...

A compiler might be able to do that for you.

Tom

--
Tom Schrijvers

Department of Computer Science
K.U. Leuven
Celestijnenlaan 200A
B-3001 Heverlee
Belgium

tel: +32 16 327544
e-mail: [hidden email]
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: hash of a lazy bytestring?

Tomasz Zielonka
In reply to this post by David Roundy-2
On Sun, May 13, 2007 at 08:30:41AM -0700, David Roundy wrote:
> I was just contemplating hashes, and it occurred to me that it would be
> nice to be able to compute the hash of a lazily constructed bytestring and
> lazily consume its output without requiring the whole string to ever be in
> memory.  Or in general, it'd be nice to be able to perform two simultaneous
> consumptions of a lazy list without requiring that the entire list be
> stored.

This was discussed before, for example:
    http://www.haskell.org/pipermail/haskell-cafe/2006-March/015093.html
I thought I've got a working solution for this problem using
concurrency:
    http://www.haskell.org/pipermail/haskell-cafe/2006-March/015136.html
but later I realised that it wouldn't work on a multiprocessor machine:
    http://www.haskell.org/pipermail/haskell-cafe/2006-March/015140.html

Best regards
Tomek
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: hash of a lazy bytestring?

Henning Thielemann
In reply to this post by David Roundy-2

On Sun, 13 May 2007, David Roundy wrote:

> I was just contemplating hashes, and it occurred to me that it would be
> nice to be able to compute the hash of a lazily constructed bytestring and
> lazily consume its output without requiring the whole string to ever be in
> memory.  Or in general, it'd be nice to be able to perform two simultaneous
> consumptions of a lazy list without requiring that the entire list be
> stored.  Another example would be computing the length of an ordinary list:
>
> do l <- readFile "foo"
>    let len = length l
>    writeFile "bar" l
>    putStrLn $ "Length is " ++ show l

An elegant implementation of 'wc' is another example.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe