Emulating bash pipe/ process lib

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Emulating bash pipe/ process lib

Marc Weber
Hi. I want to write a little haskell program executing about 4 programs
passing data via pipes. As my python script seems to be slower than a
bash script I want to try a ghc executable now.
It should invoke different parts of a text to speech chain. This way I
have one interface then.

Talar und #haskell told me that I might use runProcess and pass handles
for stdin and out created by createPipe and fdToHandle.

So my simple test looks like this:


module Main where
import System.IO
import System.Posix.IO

main = do
  (fdIn,fdOut) <- createPipe
  let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut)
  hIn <- iohIn
  hOut <- iohOut
  hPutStr hIn "test"
  line <- hGetLine hOut
  print line -- should now print test having been piped through my pipe

but I get the error:
pipe2: <file descriptor: 3>: hPutStr: illegal operation (handle is not
open for writing)

And in current CVS docs in base.System.Process.hs it is said that
createPipe is no longer exported ?

I've also found this library:
http://www.haskell.org/communities/11-2005/html/report.html#processlib
which seems to do something similar but is meant to pipe haskell types
between different threads (? I haven't understood what's going on there
because I can't understand Russian ;-)
Is this of any help for me?

Greetings Marc
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Simon Marlow-5
Marc Weber Marc Weber wrote:

> Hi. I want to write a little haskell program executing about 4 programs
> passing data via pipes. As my python script seems to be slower than a
> bash script I want to try a ghc executable now.
> It should invoke different parts of a text to speech chain. This way I
> have one interface then.
>
> Talar und #haskell told me that I might use runProcess and pass handles
> for stdin and out created by createPipe and fdToHandle.
>
> So my simple test looks like this:
>
>
> module Main where
> import System.IO
> import System.Posix.IO
>
> main = do
>   (fdIn,fdOut) <- createPipe
>   let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut)
>   hIn <- iohIn
>   hOut <- iohOut
>   hPutStr hIn "test"
>   line <- hGetLine hOut
>   print line -- should now print test having been piped through my pipe
>
> but I get the error:
> pipe2: <file descriptor: 3>: hPutStr: illegal operation (handle is not
> open for writing)
>
> And in current CVS docs in base.System.Process.hs it is said that
> createPipe is no longer exported ?

If you want to communicate with external programs via pipes, then
System.Process should provide everything you need.  Take a look at
runInteractiveProcess in particular.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Kurt Hutchinson
In reply to this post by Marc Weber
On 2/9/06, Marc Weber Marc Weber <[hidden email]> wrote:
> Hi. I want to write a little haskell program executing about 4 programs
> passing data via pipes. As my python script seems to be slower than a
> bash script I want to try a ghc executable now.
> It should invoke different parts of a text to speech chain. This way I
> have one interface then.

There is also the HsShellScript library, which has many utilities for
using Haskell as a shell programming language, including good support
for piping. You can find it here:

http://www.volker-wysk.de/hsshellscript/

--
Kurt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Donn Cave-2
In reply to this post by Marc Weber
On Thu, 9 Feb 2006, Marc Weber Marc Weber wrote:
...

> So my simple test looks like this:
>
>
> module Main where
> import System.IO
> import System.Posix.IO
>
> main = do
>   (fdIn,fdOut) <- createPipe
>   let (iohIn, iohOut) = (fdToHandle fdIn, fdToHandle fdOut)
>   hIn <- iohIn
>   hOut <- iohOut
>   hPutStr hIn "test"
>   line <- hGetLine hOut
>   print line -- should now print test having been piped through my pipe
>
> but I get the error:
> pipe2: <file descriptor: 3>: hPutStr: illegal operation (handle is not
> open for writing)

That's right.  "The first component is the fd to read from, the second
is the write end."  You're writing to the read end.

> And in current CVS docs in base.System.Process.hs it is said that
> createPipe is no longer exported ?

Maybe they're just going to export functions like this under their
common POSIX names ("pipe", "dup2"), and bring them more into line
with standard behavior (execve takes argv[0..n], not argv[1..n].)

"Slow" devices like pipes, sockets etc. get along fine with Handles
or whatever buffered I/O - as long as you have only one going at a time.
Multiple input sources - like, say you want to read a process' output
(unit 1) and diagnostic output (unit 2) separately, and either one has
the potential to fill up the pipe and block while you're waiting for
input on the other pipe - buffers at least complicate the dispatching
mechanics you need for this, if not make it impossible.  For my money,
it makes more sense to learn how deal with pipes directly at the file
descriptor level, and at worst if the committees find it too unsightly
you may have to write FFI interfaces to get pipe() back.

        Donn Cave, [hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Emulating bash pipe/ process lib

Bulat Ziganshin
Hello Donn,

Thursday, February 09, 2006, 8:58:27 PM, you wrote:

DC> "Slow" devices like pipes, sockets etc. get along fine with Handles
DC> or whatever buffered I/O - as long as you have only one going at a time.
DC> Multiple input sources - like, say you want to read a process' output
DC> (unit 1) and diagnostic output (unit 2) separately, and either one has
DC> the potential to fill up the pipe and block while you're waiting for
DC> input on the other pipe - buffers at least complicate the dispatching
DC> mechanics you need for this, if not make it impossible.

are you tried LineBuffering and NoBuffering? seem that it is developed
exactly for this case (plus for terminals)


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Emulating bash pipe/ process lib

Donn Cave-2
On Thu, 9 Feb 2006, Bulat Ziganshin wrote:

> Thursday, February 09, 2006, 8:58:27 PM, you wrote:
> DC> "Slow" devices like pipes, sockets etc. get along fine with Handles
> DC> or whatever buffered I/O - as long as you have only one going at a time.
> DC> Multiple input sources - like, say you want to read a process' output
> DC> (unit 1) and diagnostic output (unit 2) separately, and either one has
> DC> the potential to fill up the pipe and block while you're waiting for
> DC> input on the other pipe - buffers at least complicate the dispatching
> DC> mechanics you need for this, if not make it impossible.
>
> are you tried LineBuffering and NoBuffering? seem that it is developed
> exactly for this case (plus for terminals)

That's part of the idea, it helps keep the data out of buffers where
select or whatever can't see it.

But then you need functions with semantics that really support unbuffered
I/O.  When select tells you that the device is readable, you don't know
that there is one full line, or how many bytes there are, so hGetLine
doesn't apply here, nor would the Haskell equivalent of fread(3) if there
were one.  The only thing left is hGetChar - one char, then select, then
another.  (And multi-byte chars cause us to worry about this.)

Since POSIX read(2) already supports exactly the functions you need for
unbuffered I/O, it's simpler, easier and more efficient to leave the whole
business right there at the file descriptor level.  I'm sure you can make
a non-buffering buffer layer work on top of the file descriptor, but what
makes it worth the trouble?

        Donn Cave, [hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[3]: Emulating bash pipe/ process lib

Bulat Ziganshin
Hello Donn,

Friday, February 10, 2006, 12:47:42 AM, you wrote:

>> DC> "Slow" devices like pipes, sockets etc. get along fine with Handles
>> DC> or whatever buffered I/O - as long as you have only one going at a time.
>> DC> Multiple input sources - like, say you want to read a process' output
>> DC> (unit 1) and diagnostic output (unit 2) separately, and either one has
>> DC> the potential to fill up the pipe and block while you're waiting for
>> DC> input on the other pipe - buffers at least complicate the dispatching
>> DC> mechanics you need for this, if not make it impossible.
>>
>> are you tried LineBuffering and NoBuffering? seem that it is developed
>> exactly for this case (plus for terminals)

DC> That's part of the idea, it helps keep the data out of buffers where
DC> select or whatever can't see it.

DC> But then you need functions with semantics that really support unbuffered
DC> I/O.  When select tells you that the device is readable, you don't know
DC> that there is one full line, or how many bytes there are, so hGetLine
DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there
DC> were one.  The only thing left is hGetChar - one char, then select, then
DC> another.  (And multi-byte chars cause us to worry about this.)

when i think how to implementat LineBuffering, i decided that it is
the only possible way - read byte a time and see for a '\n'. i don't
know how System.IO implemented but i think that it should do the same

DC> Since POSIX read(2) already supports exactly the functions you need for
DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole
DC> business right there at the file descriptor level.

can you please describe that read(2) does that is better than reading
char-at-a-time?

DC> I'm sure you can make
DC> a non-buffering buffer layer work on top of the file descriptor, but what
DC> makes it worth the trouble?

if you don't have I/O library that implements what you need, it is
indeed simpler to use lower I/O directly. if you have I/O library that
does that you need, it is easier to write:

(hIn, hOut) <- createUnixPipe
vPutStrLn hOut "hello"
s <- vGetLine hIn

i'm writing such lib now, so i'm interested to know what i need to
do so that it will work ok.


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Simon Marlow-5
Bulat Ziganshin wrote:

> Hello Donn,
>
> Friday, February 10, 2006, 12:47:42 AM, you wrote:
>
>
>>>DC> "Slow" devices like pipes, sockets etc. get along fine with Handles
>>>DC> or whatever buffered I/O - as long as you have only one going at a time.
>>>DC> Multiple input sources - like, say you want to read a process' output
>>>DC> (unit 1) and diagnostic output (unit 2) separately, and either one has
>>>DC> the potential to fill up the pipe and block while you're waiting for
>>>DC> input on the other pipe - buffers at least complicate the dispatching
>>>DC> mechanics you need for this, if not make it impossible.
>>>
>>>are you tried LineBuffering and NoBuffering? seem that it is developed
>>>exactly for this case (plus for terminals)
>
>
> DC> That's part of the idea, it helps keep the data out of buffers where
> DC> select or whatever can't see it.
>
> DC> But then you need functions with semantics that really support unbuffered
> DC> I/O.  When select tells you that the device is readable, you don't know
> DC> that there is one full line, or how many bytes there are, so hGetLine
> DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there
> DC> were one.  The only thing left is hGetChar - one char, then select, then
> DC> another.  (And multi-byte chars cause us to worry about this.)
>
> when i think how to implementat LineBuffering, i decided that it is
> the only possible way - read byte a time and see for a '\n'. i don't
> know how System.IO implemented but i think that it should do the same

Read as much as you can into the buffer without blocking.  Line
buffering on input is actually implemented exactly the same as block
buffering.

You might argue that strictly speaking this isn't line buffering, since
you can get data from the Handle before the end of line is available.
That's true, but I'd argue this is more useful.  In fact, we changed
block buffering on input handles so that the input buffer doesn't have
to be completely full before data can be returned, which is also not
strict block buffering, but seems more useful.

I suppose conceivably you might want to force a read buffer to be
completely full so that you could guarantee to read it all without
blocking, but in that case you might as well use hGetBuf & peekArray.
Similarly you might want to ensure the buffer contains a complete line
before starting to read it, but can use hGetLine anyway.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Emulating bash pipe/ process lib

Bulat Ziganshin
Hello Simon,

Friday, February 10, 2006, 2:53:25 PM, you wrote:

i'm not very interested to do something fascinating in this area. it
seems that it is enough to do

1) non-blocking read of the entire buffer on input
2) flush buffer at each '\n' at output

that should be enough to implement LineBuffering for everyone except
purists? and for the NoBuffering the same except for flushing after
each output operation?

>>>>DC> "Slow" devices like pipes, sockets etc. get along fine with Handles
>>>>DC> or whatever buffered I/O - as long as you have only one going at a time.
>>>>DC> Multiple input sources - like, say you want to read a process' output
>>>>DC> (unit 1) and diagnostic output (unit 2) separately, and either one has
>>>>DC> the potential to fill up the pipe and block while you're waiting for
>>>>DC> input on the other pipe - buffers at least complicate the dispatching
>>>>DC> mechanics you need for this, if not make it impossible.
>>>>
>>>>are you tried LineBuffering and NoBuffering? seem that it is developed
>>>>exactly for this case (plus for terminals)
>>
>>
>> DC> That's part of the idea, it helps keep the data out of buffers where
>> DC> select or whatever can't see it.
>>
>> DC> But then you need functions with semantics that really support unbuffered
>> DC> I/O.  When select tells you that the device is readable, you don't know
>> DC> that there is one full line, or how many bytes there are, so hGetLine
>> DC> doesn't apply here, nor would the Haskell equivalent of fread(3) if there
>> DC> were one.  The only thing left is hGetChar - one char, then select, then
>> DC> another.  (And multi-byte chars cause us to worry about this.)
>>
>> when i think how to implementat LineBuffering, i decided that it is
>> the only possible way - read byte a time and see for a '\n'. i don't
>> know how System.IO implemented but i think that it should do the same

SM> Read as much as you can into the buffer without blocking.  Line
SM> buffering on input is actually implemented exactly the same as block
SM> buffering.

SM> You might argue that strictly speaking this isn't line buffering, since
SM> you can get data from the Handle before the end of line is available.
SM> That's true, but I'd argue this is more useful.  In fact, we changed
SM> block buffering on input handles so that the input buffer doesn't have
SM> to be completely full before data can be returned, which is also not
SM> strict block buffering, but seems more useful.

SM> I suppose conceivably you might want to force a read buffer to be
SM> completely full so that you could guarantee to read it all without
SM> blocking, but in that case you might as well use hGetBuf & peekArray.
SM> Similarly you might want to ensure the buffer contains a complete line
SM> before starting to read it, but can use hGetLine anyway.

--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[3]: Emulating bash pipe/ process lib

Donn Cave-2
In reply to this post by Bulat Ziganshin
On Fri, 10 Feb 2006, Bulat Ziganshin wrote:
...
> when i think how to implementat LineBuffering, i decided that it is
> the only possible way - read byte a time and see for a '\n'. i don't
> know how System.IO implemented but i think that it should do the same

Don't know - I see that Simon M followed up with an explanation that
I think confirms my impression that LineBuffering is indistinguishable
from BlockBuffering, for input.  I assume it's only there for the sake
of output, where it does make a difference.

Only NoBuffering is interoperable with select.

> DC> Since POSIX read(2) already supports exactly the functions you need for
> DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole
> DC> business right there at the file descriptor level.
>
> can you please describe that read(2) does that is better than reading
> char-at-a-time?

It returns whatever available data, as long as it's more than 0 bytes
and less than the caller-supplied limit.  This is the only read operation
that works with select (including char-at-a-time, as a special case
where the caller-supplied limit is 1.)

> DC> I'm sure you can make
> DC> a non-buffering buffer layer work on top of the file descriptor, but what
> DC> makes it worth the trouble?
>
> if you don't have I/O library that implements what you need, it is
> indeed simpler to use lower I/O directly. if you have I/O library that
> does that you need, it is easier to write:
>
> (hIn, hOut) <- createUnixPipe
> vPutStrLn hOut "hello"
> s <- vGetLine hIn
>
> i'm writing such lib now, so i'm interested to know what i need to
> do so that it will work ok.

It won't!  I mean, we can use it the same way as the ordinary Handle in
the original example, but we know in principle, if you call vGetLine,
it may block regardless of whether select reports input data, because
select can't tell you whether there's a full line of input.

So you don't have anything to worry about here - this is not your problem.
I only wanted to point out that for select-based I/O event multiplexing,
we will continue to need file descriptors and system level POSIX I/O,
and that the need for this can occur in such ordinary, mundane applications
as reading stdout and stderr in parallel.

        Donn Cave, [hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Simon Marlow-5
In reply to this post by Bulat Ziganshin
Bulat Ziganshin wrote:

> Friday, February 10, 2006, 2:53:25 PM, you wrote:
>
> i'm not very interested to do something fascinating in this area. it
> seems that it is enough to do
>
> 1) non-blocking read of the entire buffer on input
> 2) flush buffer at each '\n' at output
>
> that should be enough to implement LineBuffering for everyone except
> purists? and for the NoBuffering the same except for flushing after
> each output operation?

Yes, exactly.  This is almost what GHC's System.IO currently does,
except that for NoBuffering we have a fixed buffer size of 1 byte.  It
would be safe to have a larger buffer size for NoBuffering read handles,
but I didn't recognise that when I wrote it.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Emulating bash pipe/ process lib

Bulat Ziganshin-2
Hello Simon,

Tuesday, February 21, 2006, 4:05:57 PM, you wrote:

>> i'm not very interested to do something fascinating in this area. it
>> seems that it is enough to do
>>
>> 1) non-blocking read of the entire buffer on input
>> 2) flush buffer at each '\n' at output
>>
>> that should be enough to implement LineBuffering for everyone except
>> purists? and for the NoBuffering the same except for flushing after
>> each output operation?

SM> Yes, exactly.  This is almost what GHC's System.IO currently does,
SM> except that for NoBuffering we have a fixed buffer size of 1 byte.  It
SM> would be safe to have a larger buffer size for NoBuffering read handles,
SM> but I didn't recognise that when I wrote it.

btw, this makes NoBuffering mode unusable for some tasks. Donn Cave
wrote about this here:

...
> when i think how to implementat LineBuffering, i decided that it is
> the only possible way - read byte a time and see for a '\n'. i don't
> know how System.IO implemented but i think that it should do the same

Don't know - I see that Simon M followed up with an explanation that
I think confirms my impression that LineBuffering is indistinguishable
from BlockBuffering, for input.  I assume it's only there for the sake
of output, where it does make a difference.

Only NoBuffering is interoperable with select.

> DC> Since POSIX read(2) already supports exactly the functions you need for
> DC> unbuffered I/O, it's simpler, easier and more efficient to leave the whole
> DC> business right there at the file descriptor level.
>
> can you please describe that read(2) does that is better than reading
> char-at-a-time?

It returns whatever available data, as long as it's more than 0 bytes
and less than the caller-supplied limit.  This is the only read operation
that works with select (including char-at-a-time, as a special case
where the caller-supplied limit is 1.)

> DC> I'm sure you can make
> DC> a non-buffering buffer layer work on top of the file descriptor, but what
> DC> makes it worth the trouble?
>
> if you don't have I/O library that implements what you need, it is
> indeed simpler to use lower I/O directly. if you have I/O library that
> does that you need, it is easier to write:
>
> (hIn, hOut) <- createUnixPipe
> vPutStrLn hOut "hello"
> s <- vGetLine hIn
>
> i'm writing such lib now, so i'm interested to know what i need to
> do so that it will work ok.

It won't!  I mean, we can use it the same way as the ordinary Handle in
the original example, but we know in principle, if you call vGetLine,
it may block regardless of whether select reports input data, because
select can't tell you whether there's a full line of input.

So you don't have anything to worry about here - this is not your problem.
I only wanted to point out that for select-based I/O event multiplexing,
we will continue to need file descriptors and system level POSIX I/O,
and that the need for this can occur in such ordinary, mundane applications
as reading stdout and stderr in parallel.

        Donn Cave, [hidden email]




--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Emulating bash pipe/ process lib

Simon Marlow-5
Bulat Ziganshin wrote:

> Tuesday, February 21, 2006, 4:05:57 PM, you wrote:
>
>
>>>i'm not very interested to do something fascinating in this area. it
>>>seems that it is enough to do
>>>
>>>1) non-blocking read of the entire buffer on input
>>>2) flush buffer at each '\n' at output
>>>
>>>that should be enough to implement LineBuffering for everyone except
>>>purists? and for the NoBuffering the same except for flushing after
>>>each output operation?
>
>
> SM> Yes, exactly.  This is almost what GHC's System.IO currently does,
> SM> except that for NoBuffering we have a fixed buffer size of 1 byte.  It
> SM> would be safe to have a larger buffer size for NoBuffering read handles,
> SM> but I didn't recognise that when I wrote it.
>
> btw, this makes NoBuffering mode unusable for some tasks. Donn Cave
> wrote about this here:

Someone is confused here, probably me.  Could you elaborate on what
you're referring to by "this", and "unusable for some tasks"?

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe