getContents and lazy evaluation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

getContents and lazy evaluation

Tamas K Papp
Hi,

I am newbie, reading the Gentle Introduction.  Chapter 7
(Input/Output) says

  Pragmatically, it may seem that getContents must immediately read an
  entire file or channel, resulting in poor space and time performance
  under certain conditions. However, this is not the case. The key
  point is that getContents returns a "lazy" (i.e. non-strict) list of
  characters (recall that strings are just lists of characters in
  Haskell), whose elements are read "by demand" just like any other
  list. An implementation can be expected to implement this
  demand-driven behavior by reading one character at a time from the
  file as they are required by the computation.

So what happens if I do

contents <- getContents handle
putStr (take 5 contents) -- assume that the implementation
              -- only reads a few chars
-- delete the file in some way
putStr (take 500 contents) -- but the file is not there now

If an IO function is lazy, doesn't that break sequentiality?  Sorry if
the question is stupid.

Thanks,

Tamas
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Bugzilla from robdockins@fastmail.fm
On Friday 01 September 2006 15:19, Tamas K Papp wrote:

> Hi,
>
> I am newbie, reading the Gentle Introduction.  Chapter 7
> (Input/Output) says
>
>   Pragmatically, it may seem that getContents must immediately read an
>   entire file or channel, resulting in poor space and time performance
>   under certain conditions. However, this is not the case. The key
>   point is that getContents returns a "lazy" (i.e. non-strict) list of
>   characters (recall that strings are just lists of characters in
>   Haskell), whose elements are read "by demand" just like any other
>   list. An implementation can be expected to implement this
>   demand-driven behavior by reading one character at a time from the
>   file as they are required by the computation.
>
> So what happens if I do
>
> contents <- getContents handle
> putStr (take 5 contents) -- assume that the implementation
>               -- only reads a few chars
> -- delete the file in some way
> putStr (take 500 contents) -- but the file is not there now
>
> If an IO function is lazy, doesn't that break sequentiality?  Sorry if
> the question is stupid.

This is not a stupid question at all, and it highlights the main problem with
lazy IO.  The solution is, in essence "don't do that, because Bad Things will
happen".  It's pretty unsatisfactory, but there it is.  For this reason, lazy
IO is widely regarded as somewhat dangerous (or even as an outright
misfeature, by a few).

If you are going to be doing simple pipe-style IO (ie, read some data
sequentially, manipulate it, spit out the output),  lazy IO is very
convenient, and it makes putting together quick scripts very easy.  However,
if you're doing something more advanced, you'd probably do best to stay away
from lazy IO.

Welcome to Haskell, BTW  :-)

> Thanks,
>
> Tamas

--
Rob Dockins

Talk softly and drive a Sherman tank.
Laugh hard, it's a long way to the bank.
       -- TMBG
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Duncan Coutts
On Fri, 2006-09-01 at 16:28 -0400, Robert Dockins wrote:

> On Friday 01 September 2006 15:19, Tamas K Papp wrote:
> > Hi,
> >
> > I am newbie, reading the Gentle Introduction.  Chapter 7
> > (Input/Output) says
> >
> >   Pragmatically, it may seem that getContents must immediately read an
> >   entire file or channel, resulting in poor space and time performance
> >   under certain conditions. However, this is not the case. The key
> >   point is that getContents returns a "lazy" (i.e. non-strict) list of
> >   characters (recall that strings are just lists of characters in
> >   Haskell), whose elements are read "by demand" just like any other
> >   list. An implementation can be expected to implement this
> >   demand-driven behavior by reading one character at a time from the
> >   file as they are required by the computation.
> >
> > So what happens if I do
> >
> > contents <- getContents handle
> > putStr (take 5 contents) -- assume that the implementation
> >               -- only reads a few chars
> > -- delete the file in some way
> > putStr (take 500 contents) -- but the file is not there now
> >
> > If an IO function is lazy, doesn't that break sequentiality?  Sorry if
> > the question is stupid.
>
> This is not a stupid question at all, and it highlights the main problem with
> lazy IO.  The solution is, in essence "don't do that, because Bad Things will
> happen".  It's pretty unsatisfactory, but there it is.  For this reason, lazy
> IO is widely regarded as somewhat dangerous (or even as an outright
> misfeature, by a few).
>
> If you are going to be doing simple pipe-style IO (ie, read some data
> sequentially, manipulate it, spit out the output),  lazy IO is very
> convenient, and it makes putting together quick scripts very easy.  However,
> if you're doing something more advanced, you'd probably do best to stay away
> from lazy IO.

Since working on Data.ByteString.Lazy I'm now even more of a pro-lazy-IO
zealot than I was before ;-)

In practise I expect that most programs that deal with file IO strictly
do not handle the file disappearing under them very well either. At best
the probably throw an exception and let something else clean up. The
same can be done with lazy I, though it requires using imprecise
exceptions which some people grumble about. So I would contend that lazy
IO is actually applicable in rather a wider range of circumstances than
you might. :-)

Note also, that with lazy IO we can write really short programs that are
blindingly quick. Lazy IO allows us to save a copy through the Handle
buffer.

BTW in the above case the "bad thing that will happen" is that contents
will be truncated. As I said, I think it's better to throw an exception,
which is what Data.ByteString.Lazy.hGetContents does.

Duncan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Bugzilla from robdockins@fastmail.fm
On Friday 01 September 2006 16:46, Duncan Coutts wrote:

> On Fri, 2006-09-01 at 16:28 -0400, Robert Dockins wrote:
> > On Friday 01 September 2006 15:19, Tamas K Papp wrote:
> > > Hi,
> > >
> > > I am newbie, reading the Gentle Introduction.  Chapter 7
> > > (Input/Output) says
> > >
> > >   Pragmatically, it may seem that getContents must immediately read an
> > >   entire file or channel, resulting in poor space and time performance
> > >   under certain conditions. However, this is not the case. The key
> > >   point is that getContents returns a "lazy" (i.e. non-strict) list of
> > >   characters (recall that strings are just lists of characters in
> > >   Haskell), whose elements are read "by demand" just like any other
> > >   list. An implementation can be expected to implement this
> > >   demand-driven behavior by reading one character at a time from the
> > >   file as they are required by the computation.
> > >
> > > So what happens if I do
> > >
> > > contents <- getContents handle
> > > putStr (take 5 contents) -- assume that the implementation
> > >               -- only reads a few chars
> > > -- delete the file in some way
> > > putStr (take 500 contents) -- but the file is not there now
> > >
> > > If an IO function is lazy, doesn't that break sequentiality?  Sorry if
> > > the question is stupid.
> >
> > This is not a stupid question at all, and it highlights the main problem
> > with lazy IO.  The solution is, in essence "don't do that, because Bad
> > Things will happen".  It's pretty unsatisfactory, but there it is.  For
> > this reason, lazy IO is widely regarded as somewhat dangerous (or even as
> > an outright misfeature, by a few).
> >
> > If you are going to be doing simple pipe-style IO (ie, read some data
> > sequentially, manipulate it, spit out the output),  lazy IO is very
> > convenient, and it makes putting together quick scripts very easy.
> > However, if you're doing something more advanced, you'd probably do best
> > to stay away from lazy IO.
>
> Since working on Data.ByteString.Lazy I'm now even more of a pro-lazy-IO
> zealot than I was before ;-)
>
> In practise I expect that most programs that deal with file IO strictly
> do not handle the file disappearing under them very well either.

That's probably true, except for especially robust applications where such a
thing is a regular (or at least expected) event.

> At best
> the probably throw an exception and let something else clean up. The
> same can be done with lazy I, though it requires using imprecise
> exceptions which some people grumble about. So I would contend that lazy
> IO is actually applicable in rather a wider range of circumstances than
> you might. :-)

Perhaps I should be more clear.  When I said "advanced" above I meant "any use
whereby you treat a file as random access, read/write storage, or do any kind
of directory manipulation (including deleting and or renaming files)".  Lazy
I/O (as it currently stands) doesn't play very nice with those use cases.

I agree generally with the idea that lazy I/O is good.  The problem is that it
is a "leaky abstraction"; details are exposed to the user that should ideally
be completely hidden.  Unfortunately, the leaks aren't likely to get plugged
without pretty tight operating system support, which I suspect won't be
happening anytime soon.

> Note also, that with lazy IO we can write really short programs that are
> blindingly quick. Lazy IO allows us to save a copy through the Handle
> buffer.

> BTW in the above case the "bad thing that will happen" is that contents
> will be truncated. As I said, I think it's better to throw an exception,
> which is what Data.ByteString.Lazy.hGetContents does.

Well, AFAIK, the behavior is officially undefined, which is my real beef.  I
agree that it _should_ throw an exception.

> Duncan

--
Rob Dockins

Talk softly and drive a Sherman tank.
Laugh hard, it's a long way to the bank.
       -- TMBG
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Donn Cave-2
On Fri, 1 Sep 2006, Robert Dockins wrote:
> On Friday 01 September 2006 16:46, Duncan Coutts wrote:
...
>> Note also, that with lazy IO we can write really short programs that are
>> blindingly quick. Lazy IO allows us to save a copy through the Handle
>> buffer.

(Never understood why some people think it would be such a good thing
to be blinded, but as long as it's you and not me ... )


>> BTW in the above case the "bad thing that will happen" is that contents
>> will be truncated. As I said, I think it's better to throw an exception,
>> which is what Data.ByteString.Lazy.hGetContents does.
>
> Well, AFAIK, the behavior is officially undefined, which is my real beef.  I
> agree that it _should_ throw an exception.

Is this about Microsoft Windows?  On UNIX, I would expect deletion of
a file to have no effect on I/O of any kind on that file.  I thought
the problems with hGetContents more commonly involve operations on
the file handle, e.g., hClose.

        Donn Cave, [hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Bugzilla from robdockins@fastmail.fm
On Friday 01 September 2006 18:01, Donn Cave wrote:

> On Fri, 1 Sep 2006, Robert Dockins wrote:
> > On Friday 01 September 2006 16:46, Duncan Coutts wrote:
>
> ...
>
> >> Note also, that with lazy IO we can write really short programs that are
> >> blindingly quick. Lazy IO allows us to save a copy through the Handle
> >> buffer.
>
> (Never understood why some people think it would be such a good thing
> to be blinded, but as long as it's you and not me ... )
>
> >> BTW in the above case the "bad thing that will happen" is that contents
> >> will be truncated. As I said, I think it's better to throw an exception,
> >> which is what Data.ByteString.Lazy.hGetContents does.
> >
> > Well, AFAIK, the behavior is officially undefined, which is my real beef.
> >  I agree that it _should_ throw an exception.
>
> Is this about Microsoft Windows?  On UNIX, I would expect deletion of
> a file to have no effect on I/O of any kind on that file.  I thought
> the problems with hGetContents more commonly involve operations on
> the file handle, e.g., hClose.

Ahh... I think you're right.

However, this just illustrates the problem.  The point is that the answer the
question "what happens when I do <odd thing involving lazy I/O>" is "it
depends".  And to the obvious followup question "what does it depend on?" the
answer is "well.... it's complicated".

> Donn Cave, [hidden email]

--
Rob Dockins

Talk softly and drive a Sherman tank.
Laugh hard, it's a long way to the bank.
       -- TMBG
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Duncan Coutts
In reply to this post by Bugzilla from robdockins@fastmail.fm
On Fri, 2006-09-01 at 17:36 -0400, Robert Dockins wrote:

> Perhaps I should be more clear.  When I said "advanced" above I meant "any use
> whereby you treat a file as random access, read/write storage, or do any kind
> of directory manipulation (including deleting and or renaming files)".  Lazy
> I/O (as it currently stands) doesn't play very nice with those use cases.

Indeed, it can't be used in that case.

> I agree generally with the idea that lazy I/O is good.  The problem is that it
> is a "leaky abstraction"; details are exposed to the user that should ideally
> be completely hidden.  Unfortunately, the leaks aren't likely to get plugged
> without pretty tight operating system support, which I suspect won't be
> happening anytime soon.

Yes it is leaky.

> Well, AFAIK, the behavior is officially undefined, which is my real beef.  I
> agree that it _should_ throw an exception.

Ah, I had thought it was defined to simply truncate. It being undefined
isn't good. It seems that it would be straightforward to define it to
have the truncation behaviour. If Haskell-prime gets imprecise
exceptions then that could be changed.

Duncan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Julien Oster
In reply to this post by Duncan Coutts
Duncan Coutts wrote:

Hi,

> In practise I expect that most programs that deal with file IO strictly
> do not handle the file disappearing under them very well either. At best
> the probably throw an exception and let something else clean up.

And at least in Unix world, they just don't disappear. Normally, if you
delete a file, you just delete its directory entry. If there still is
something with an open handle to it, i.e. your program, the
corresponding "inode" (that's basically the file itself without its name
or names) still happily exists for your seeking, reading and writing.
Then, when your program closes the file and there really is no remaining
directory entry and no other process accessing it, the inode is removed
as well.

One trick for temporary files on unix is opening a new file, immediately
deleting it but still using it to write and read data.

So no problem here.

But what happens when two processes use the same file and one process is
writing into it using lazy IO which didn't happen yet? The other process
wouldn't see its changes yet.

I'm not sure if it matters, however, since sooner or later that IO will
happen. And I believe that lazy IO still means that for one operation
actually taking place, all prior operations take place in the right
order beforehand as well, no?

As for two processes writing to the same file at the same time, very bad
things may happen anyway. Sure, lazy IO prevents doing communication
between running processes using plain files, but why would you do
something like that?

Regards,
Julien




_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Donn Cave-2
In reply to this post by Tamas K Papp
Quoth Julien Oster <[hidden email]>:
...
| But what happens when two processes use the same file and one process is
| writing into it using lazy IO which didn't happen yet? The other process
| wouldn't see its changes yet.

That's actually a much more general problem, one that I imagine applies
to hPutStr et al. too.  Application level writes are ordinarily buffered
in process space by the I/O library, so output from an ordinary C program
may not appear on disk (or in kernel space disk I/O buffer) until just
before the program exits.

| As for two processes writing to the same file at the same time, very bad
| things may happen anyway. Sure, lazy IO prevents doing communication
| between running processes using plain files, but why would you do
| something like that?

Quite a few reasons, depending on how you define communication.  You
might even be tempted to use hGetContents in such cases.  For example,
one common way to share a file is to interlock around some resource,
and when you acquire the lock, you read the file (get its contents)
and release the lock.

        Donn Cave, [hidden email]
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

David Roundy-2
In reply to this post by Duncan Coutts
On Fri, Sep 01, 2006 at 11:47:20PM +0100, Duncan Coutts wrote:
> On Fri, 2006-09-01 at 17:36 -0400, Robert Dockins wrote:
> > Well, AFAIK, the behavior is officially undefined, which is my
> > real beef.  I agree that it _should_ throw an exception.
>
> Ah, I had thought it was defined to simply truncate. It being
> undefined isn't good. It seems that it would be straightforward to
> define it to have the truncation behaviour. If Haskell-prime gets
> imprecise exceptions then that could be changed.

Fortunately, the undefined behavior in this case is unrelated to the
lazy IO.  On windows, the removal of the file will fail, while on
posix systems there won't be any failure at all.  The same behavior
would show up if you opened the file for non-lazy reading, and tried
to read part of the file, then delete it, then read the rest.

The "undefinedness" in this example, isn't in the haskell language,
but in the filesystem semantics, and that's not something we want the
language specifying (since it's something over which it has no
control).  Lazy IO definitely works much more nicely with posix
filesystems, but that's unsurprising, since posix filesystem semantics
are much nicer than those of Windows.
--
David Roundy
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: getContents and lazy evaluation

Esa Ilari Vuokko
Hi

On 9/6/06, David Roundy <[hidden email]> wrote:
> Fortunately, the undefined behavior in this case is unrelated to the
> lazy IO.  On windows, the removal of the file will fail, while on
> posix systems there won't be any failure at all.  The same behavior
> would show up if you opened the file for non-lazy reading, and tried
> to read part of the file, then delete it, then read the rest.

This is not strictly speaking true.  If all the handles opened to the file
in question are in FILE_SHARE_DELETE-sharing mode, it can be
marked for deletion when last handle to it is closed.  It can also be
moved and renamed.

But it is true that removal might fail because of open handle, and it is true
that it will fail as implemented currently for ghc (and probably for other
compilers as well.)

> The "undefinedness" in this example, isn't in the haskell language,
> but in the filesystem semantics, and that's not something we want the
> language specifying (since it's something over which it has no

Happily this isn't lazy IO-issue, it's just file IO issue for all
files opened as
specified by haskell98.  Sharing mode would be really nice to have in
Windows, as would security attributes.  But as you say, these are hard
things to specify because not everyone has those features.  So, at least
it works nicely in posixy-systems, eh?

Best regards,
--Esa Ilari Vuokko
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe