Strict getContents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Strict getContents

Li-yao Xia-2
Hello Libraries,

In base, the functions which read all contents from a handle or file
into one String currently all do lazy IO: readFile, getContents,
hGetContents.

https://hackage.haskell.org/package/base-4.12.0.0/docs/System-IO.html#v:hGetContents

The easiest way to get a strict alternative seems to be to explicitly
force the list, for example using ```length contents `seq` pure ()```,
but that's far from an obvious solution.

Is there a better way?

If not, I propose to add readFile', getContents', hGetContents', which
don't do lazy IO.

It regularly creates confusion among beginners, and it's easy to assume
that lazy IO is benign if that's the only way to do certain operations,
when it's arguably the wrong way to read files to begin with.

Cheers,
Li-yao
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Vanessa McHale-2
I believe such a function exists in the strict package.

I agree that it would be good to add such functions to base.

> On Sep 11, 2019, at 1:01 PM, Li-yao Xia <[hidden email]> wrote:
>
> Hello Libraries,
>
> In base, the functions which read all contents from a handle or file into one String currently all do lazy IO: readFile, getContents, hGetContents.
>
> https://hackage.haskell.org/package/base-4.12.0.0/docs/System-IO.html#v:hGetContents
>
> The easiest way to get a strict alternative seems to be to explicitly force the list, for example using ```length contents `seq` pure ()```, but that's far from an obvious solution.
>
> Is there a better way?
>
> If not, I propose to add readFile', getContents', hGetContents', which don't do lazy IO.
>
> It regularly creates confusion among beginners, and it's easy to assume that lazy IO is benign if that's the only way to do certain operations, when it's arguably the wrong way to read files to begin with.
>
> Cheers,
> Li-yao
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Henning Thielemann
In reply to this post by Li-yao Xia-2

On Wed, 11 Sep 2019, Li-yao Xia wrote:

> The easiest way to get a strict alternative seems to be to explicitly
> force the list, for example using ```length contents `seq` pure ()```,
> but that's far from an obvious solution.

I am not sure, whether this works reliably. Evaluating the length of
'contents' only generates the skeleton of the list but not immediately the
element values. A cleaner way would be to use 'deepseq'.
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Li-yao Xia-2
Hi Henning,


On 9/11/19 2:52 PM, Henning Thielemann wrote:
 >
 > On Wed, 11 Sep 2019, Li-yao Xia wrote:
 >
 >> The easiest way to get a strict alternative seems to be to explicitly
 >> force the list, for example using ```length contents `seq` pure ()```,
 >> but that's far from an obvious solution.
 >
 > I am not sure, whether this works reliably. Evaluating the length of
 > 'contents' only generates the skeleton of the list but not immediately
 > the element values. A cleaner way would be to use 'deepseq'.


That's an interesting question, because I'm pretty confident this is a
reliable way to force getContents, but I'm less sure I can convince you
of it easily.

Thinking of how that could break, I believe that one would have to get
out of their way in order to implement getContents such that forcing the
list does not also make its characters available even after the file is
closed, at which point the author of that function should stop and
wonder whether it is worth the trouble, and I trust that the author, if
they even considered the possibility, would reach the reasonable
conclusion of "don't do that".

Of course, that argument can go wrong in many ways, especially because
it is full of subjective judgements. So to get some closure, let's look
at the source code. Skipping over the intermediate steps that one would
have to check for themselves anyway, it boils down to this unpack function:

https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.IO.Handle.Text.html#unpack

Near the end of the function is the line that adds a character c as part
of the string that will be returned at the end, we can see that the cons
comes with the character fully read by peekElemOf:

               unpackRB (c : acc) (i-1)

Cheers,
Li-yao
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Vanessa McHale-2
Wouldn't it be more sensible to not interleave IO in the first place?

Cheers,
Vanessa

On 9/11/19 8:13 PM, Li-yao Xia wrote:

> Hi Henning,
>
>
> On 9/11/19 2:52 PM, Henning Thielemann wrote:
> >
> > On Wed, 11 Sep 2019, Li-yao Xia wrote:
> >
> >> The easiest way to get a strict alternative seems to be to explicitly
> >> force the list, for example using ```length contents `seq` pure ()```,
> >> but that's far from an obvious solution.
> >
> > I am not sure, whether this works reliably. Evaluating the length of
> > 'contents' only generates the skeleton of the list but not immediately
> > the element values. A cleaner way would be to use 'deepseq'.
>
>
> That's an interesting question, because I'm pretty confident this is a
> reliable way to force getContents, but I'm less sure I can convince
> you of it easily.
>
> Thinking of how that could break, I believe that one would have to get
> out of their way in order to implement getContents such that forcing
> the list does not also make its characters available even after the
> file is closed, at which point the author of that function should stop
> and wonder whether it is worth the trouble, and I trust that the
> author, if they even considered the possibility, would reach the
> reasonable conclusion of "don't do that".
>
> Of course, that argument can go wrong in many ways, especially because
> it is full of subjective judgements. So to get some closure, let's
> look at the source code. Skipping over the intermediate steps that one
> would have to check for themselves anyway, it boils down to this
> unpack function:
>
> https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.IO.Handle.Text.html#unpack
>
>
> Near the end of the function is the line that adds a character c as
> part of the string that will be returned at the end, we can see that
> the cons comes with the character fully read by peekElemOf:
>
>               unpackRB (c : acc) (i-1)
>
> Cheers,
> Li-yao
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Joseph C. Sible
In reply to this post by Li-yao Xia-2
+1 to adding those non-lazy versions. Such functions could work
without having to half-close the handle, thus making it easier to
continue reading from a file after EOF (à la `tail -f`). (I've asked
about how to do this exact thing before at
https://stackoverflow.com/q/56221606/7509065 and this would give it a
trivial answer.)

Joseph C. Sible

On Wed, Sep 11, 2019 at 2:01 PM Li-yao Xia <[hidden email]> wrote:

>
> Hello Libraries,
>
> In base, the functions which read all contents from a handle or file
> into one String currently all do lazy IO: readFile, getContents,
> hGetContents.
>
> https://hackage.haskell.org/package/base-4.12.0.0/docs/System-IO.html#v:hGetContents
>
> The easiest way to get a strict alternative seems to be to explicitly
> force the list, for example using ```length contents `seq` pure ()```,
> but that's far from an obvious solution.
>
> Is there a better way?
>
> If not, I propose to add readFile', getContents', hGetContents', which
> don't do lazy IO.
>
> It regularly creates confusion among beginners, and it's easy to assume
> that lazy IO is benign if that's the only way to do certain operations,
> when it's arguably the wrong way to read files to begin with.
>
> Cheers,
> Li-yao
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

Eric Mertens
I like the idea of having strict versions of these functions. I also prefer to recommend people use the strict versions of these functions from text package's Data.Text.IO to encouraging using String to load files. I agree that people should be able to read file contents strictly without having to use tricks like forcing the length.

On Wed, Sep 11, 2019 at 3:57 PM Joseph C. Sible <[hidden email]> wrote:
+1 to adding those non-lazy versions. Such functions could work
without having to half-close the handle, thus making it easier to
continue reading from a file after EOF (à la `tail -f`). (I've asked
about how to do this exact thing before at
https://stackoverflow.com/q/56221606/7509065 and this would give it a
trivial answer.)

Joseph C. Sible

On Wed, Sep 11, 2019 at 2:01 PM Li-yao Xia <[hidden email]> wrote:
>
> Hello Libraries,
>
> In base, the functions which read all contents from a handle or file
> into one String currently all do lazy IO: readFile, getContents,
> hGetContents.
>
> https://hackage.haskell.org/package/base-4.12.0.0/docs/System-IO.html#v:hGetContents
>
> The easiest way to get a strict alternative seems to be to explicitly
> force the list, for example using ```length contents `seq` pure ()```,
> but that's far from an obvious solution.
>
> Is there a better way?
>
> If not, I propose to add readFile', getContents', hGetContents', which
> don't do lazy IO.
>
> It regularly creates confusion among beginners, and it's easy to assume
> that lazy IO is benign if that's the only way to do certain operations,
> when it's arguably the wrong way to read files to begin with.
>
> Cheers,
> Li-yao
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries


--
Eric Mertens

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Strict getContents

David Feuer
In reply to this post by Li-yao Xia-2
Certainly reading directly into a String is inefficient for a strict read. I imagine the best way is to read (eagerly) into a lazy ByteString and decode that lazily into a String. Even for Text, it may well be better to read into a lazy ByteString and decode into lazy Text, since the latter tends to take considerably more memory.

On Wed, Sep 11, 2019, 2:01 PM Li-yao Xia <[hidden email]> wrote:
Hello Libraries,

In base, the functions which read all contents from a handle or file
into one String currently all do lazy IO: readFile, getContents,
hGetContents.

https://hackage.haskell.org/package/base-4.12.0.0/docs/System-IO.html#v:hGetContents

The easiest way to get a strict alternative seems to be to explicitly
force the list, for example using ```length contents `seq` pure ()```,
but that's far from an obvious solution.

Is there a better way?

If not, I propose to add readFile', getContents', hGetContents', which
don't do lazy IO.

It regularly creates confusion among beginners, and it's easy to assume
that lazy IO is benign if that's the only way to do certain operations,
when it's arguably the wrong way to read files to begin with.

Cheers,
Li-yao
_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries