Handling multiple fds with GHC

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling multiple fds with GHC

Markus Ongyerth
Hi,

the last few days, I tried to get an IO-Event system running with GHC i.e. trigger an IO action when there is data to read from a fd.
I looked at a few different implementations, but all of them have some downside.

 * using select package
   - This uses the select syscall. select is rather limited (fd cannot be >1024)

 * using GHC.Event
   - GHC.Event is broken in 7.10.1 (unless unsafeCoerce and a hacky trick are used)
   - GHC.Event is GHC internal according to hackage
   - Both Network libraries I looked at (networking (Network.Socket) and socket (System.Socket)) crash the application with GHC.Event
    - with 7.8+ I didn't see a way to create your own EventManager, so it only works with -threaded

 * using forkIO and threadWaitRead for each fd in a loop
    - needs some kind of custom control structure around it
    - uses a separate thread for each fd
    - might become pretty awkward to handle multiple events

 * using poll package
   - blocks in a safe foreign call
   - needs some kind of wrapper


From the above list, GHC.Event isn't usable (for me) right now. It would require some work for my usecase.
The select option is usable, but suffers from the same problems as poll + the limitation mentioned, so it is strictly worse.

This leaves me with two options: poll and forkIO + blocking.

Those are based on two completely different approaches to event handling..

poll can be used in a rather classic event handling system with a main loop that blocks until an event occurs (or a timeout triggers) and handles the event in the loop.
forkIO + blocking is closer to registering an action later that should be triggered by an event.

My main questions right now are:
1. How bad is it for the (non-threaded) runtime to be blocking in a foreign call most of the time?
2. How significant will the overhead be for the forkIO version?
3. Is there a *good* way to use something like threadWaitRead that allows to wake up on other events as well?
4. Is there a better way to handle multiple fds that may get readable data at any time, in Haskell/with GHC right now?

Thanks in advance,
Ongy

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Handling multiple fds with GHC

David Turner-2
Hi,

Why the non-threaded runtime, out of interest?

Threads forked with forkIO are pretty lightweight, and although things look like blocking calls from the Haskell point of view, as I understand it under the hood it's all done with events of one form or another. Thus even with the non-threaded runtime you will see forkIO-threads behaving as if they're running concurrently. In particular, you have two threads blocked trying to read from two different Handles and each will be awoken just when there's data to read, and the rest of the runtime will carry on even while they're blocked. Try it!

If you're dealing with FDs that you've acquired from elsewhere, the function unix:System.Posix.IO.ByteString.fdToHandle can be used to import them and then they work like normal Handles in terms of blocking operations etc.

Whenever I've had to deal with waking up for one of a number of reasons (not all of which are FDs) I've found the simplicity of STM is hard to beat. Something like:

atomically ((Left <$> waitForFirstThing) <|> (Right <$> waitForSecondThing))

where waitForFirstThing and waitForSecondThing are blocked waiting for something interesting to occur in a TVar that they're watching. It's so simple that I reckon it's worth doing it like that and only trying something more complicated if it turns out from experimentation that this has too much overhead for you - "make it right" precedes "make it fast".

Hope that helps,

David




On 7 October 2015 at 08:49, Markus Ongyerth <[hidden email]> wrote:
Hi,

the last few days, I tried to get an IO-Event system running with GHC i.e. trigger an IO action when there is data to read from a fd.
I looked at a few different implementations, but all of them have some downside.

 * using select package
   - This uses the select syscall. select is rather limited (fd cannot be >1024)

 * using GHC.Event
   - GHC.Event is broken in 7.10.1 (unless unsafeCoerce and a hacky trick are used)
   - GHC.Event is GHC internal according to hackage
   - Both Network libraries I looked at (networking (Network.Socket) and socket (System.Socket)) crash the application with GHC.Event
    - with 7.8+ I didn't see a way to create your own EventManager, so it only works with -threaded

 * using forkIO and threadWaitRead for each fd in a loop
    - needs some kind of custom control structure around it
    - uses a separate thread for each fd
    - might become pretty awkward to handle multiple events

 * using poll package
   - blocks in a safe foreign call
   - needs some kind of wrapper


From the above list, GHC.Event isn't usable (for me) right now. It would require some work for my usecase.
The select option is usable, but suffers from the same problems as poll + the limitation mentioned, so it is strictly worse.

This leaves me with two options: poll and forkIO + blocking.

Those are based on two completely different approaches to event handling..

poll can be used in a rather classic event handling system with a main loop that blocks until an event occurs (or a timeout triggers) and handles the event in the loop.
forkIO + blocking is closer to registering an action later that should be triggered by an event.

My main questions right now are:
1. How bad is it for the (non-threaded) runtime to be blocking in a foreign call most of the time?
2. How significant will the overhead be for the forkIO version?
3. Is there a *good* way to use something like threadWaitRead that allows to wake up on other events as well?
4. Is there a better way to handle multiple fds that may get readable data at any time, in Haskell/with GHC right now?

Thanks in advance,
Ongy

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Handling multiple fds with GHC

Markus Ongyerth
2015-10-07 18:30 GMT+02:00 David Turner <[hidden email]>:
> Hi,
>
> Why the non-threaded runtime, out of interest?

Mostly because i am used to the poll/select method I mentioned and that
one works without any threading.
I don't really mind using the threaded runtime though, it's more habit.

> Threads forked with forkIO are pretty lightweight, and although things look
> like blocking calls from the Haskell point of view, as I understand it under
> the hood it's all done with events of one form or another. Thus even with
> the non-threaded runtime you will see forkIO-threads behaving as if they're
> running concurrently. In particular, you have two threads blocked trying to
> read from two different Handles and each will be awoken just when there's
> data to read, and the rest of the runtime will carry on even while they're
> blocked. Try it!

Yeah, I know and I tried that.
As far as I can see, that's actually why things break with GHC.Event.
The Event system tries to register the Fd while it was registered by me
and encounters an EEXIST from epoll.

> If you're dealing with FDs that you've acquired from elsewhere, the function
> unix:System.Posix.IO.ByteString.fdToHandle can be used to import them and
> then they work like normal Handles in terms of blocking operations etc.
>
> Whenever I've had to deal with waking up for one of a number of reasons (not
> all of which are FDs) I've found the simplicity of STM is hard to beat.
> Something like:
>
> atomically ((Left <$> waitForFirstThing) <|> (Right <$> waitForSecondThing))

Looks like I should look up STM. Does this scale easily?
I don't really need huge amounts, but I don't have any knowledge about the
number of Fds I will have.

> where waitForFirstThing and waitForSecondThing are blocked waiting for
> something interesting to occur in a TVar that they're watching. It's so
> simple that I reckon it's worth doing it like that and only trying something
> more complicated if it turns out from experimentation that this has too much
> overhead for you - "make it right" precedes "make it fast".
>
> Hope that helps,
>
> David

Thanks for the help,

Ongy
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Handling multiple fds with GHC

Gregory Collins-3

On Wed, Oct 7, 2015 at 10:16 AM, Markus Ongyerth <[hidden email]> wrote:
Mostly because i am used to the poll/select method I mentioned and that
one works without any threading.
I don't really mind using the threaded runtime though, it's more habit.

The stock stuff in the threaded runtime uses epoll() out of the box. When you call hRead on a Handle, if the handle would block then you ultimately get a call to threadWaitRead or threadWaitWrite; these functions register interest in the given file descriptor, and the IO manager / GHC runtime scheduler will wake up your thread (GHC uses "green" threads) when the file descriptor becomes writable.

G
--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Handling multiple fds with GHC

David Turner-2
In reply to this post by Markus Ongyerth


On 7 October 2015 at 18:16, Markus Ongyerth <[hidden email]> wrote:
2015-10-07 18:30 GMT+02:00 David Turner <[hidden email]>:
> Hi,
>
> Why the non-threaded runtime, out of interest?

Mostly because i am used to the poll/select method I mentioned and that
one works without any threading.
I don't really mind using the threaded runtime though, it's more habit.

> Threads forked with forkIO are pretty lightweight, and although things look
> like blocking calls from the Haskell point of view, as I understand it under
> the hood it's all done with events of one form or another. Thus even with
> the non-threaded runtime you will see forkIO-threads behaving as if they're
> running concurrently. In particular, you have two threads blocked trying to
> read from two different Handles and each will be awoken just when there's
> data to read, and the rest of the runtime will carry on even while they're
> blocked. Try it!

Yeah, I know and I tried that.
As far as I can see, that's actually why things break with GHC.Event.
The Event system tries to register the Fd while it was registered by me
and encounters an EEXIST from epoll.


Ah, ok, so you can either do your epolling through the Haskell runtime or with your bare hands but you can't do both on a single FD.

> If you're dealing with FDs that you've acquired from elsewhere, the function
> unix:System.Posix.IO.ByteString.fdToHandle can be used to import them and
> then they work like normal Handles in terms of blocking operations etc.
>
> Whenever I've had to deal with waking up for one of a number of reasons (not
> all of which are FDs) I've found the simplicity of STM is hard to beat.
> Something like:
>
> atomically ((Left <$> waitForFirstThing) <|> (Right <$> waitForSecondThing))

Looks like I should look up STM. Does this scale easily?
I don't really need huge amounts, but I don't have any knowledge about the
number of Fds I will have.

Waiting on arbitrarily many things is pretty much as simple (as long as they all have the same type so you can put them in a list):

atomically (asum listOfWaitingThings)

In terms of code complexity that scales just fine! I'm afraid I've no real idea what the performance characteristics of such a device would be without trying it out in your use case. Whenever I've been doing this kind of thing I've always found myself IO-bound rather than CPU-bound so I've never found myself worrying too much about the efficiency of the code itself.

If you're used to doing select/poll things yourself then it may help to think of Haskell threads blocking on Handles as basically a way to do an epoll-based event loop on the underlying FDs but with a much nicer syntax and less mucking around with explicit continuations. Similarly, if you're used to dealing with task scheduling at a low level then it may help to think of STM transactions blocking as a way to muck around with the run queues in the scheduler but with a much nicer syntax and less mucking around with explicit continuations.



Best wishes,

David



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Handling multiple fds with GHC

Markus Ongyerth
2015-10-07 21:17 GMT+02:00 David Turner <[hidden email]>:

>
>
> On 7 October 2015 at 18:16, Markus Ongyerth <[hidden email]> wrote:
>>
>> 2015-10-07 18:30 GMT+02:00 David Turner <[hidden email]>:
>> > Hi,
>> >
>> > Why the non-threaded runtime, out of interest?
>>
>> Mostly because i am used to the poll/select method I mentioned and that
>> one works without any threading.
>> I don't really mind using the threaded runtime though, it's more habit.
>>
>> > Threads forked with forkIO are pretty lightweight, and although things
>> > look
>> > like blocking calls from the Haskell point of view, as I understand it
>> > under
>> > the hood it's all done with events of one form or another. Thus even
>> > with
>> > the non-threaded runtime you will see forkIO-threads behaving as if
>> > they're
>> > running concurrently. In particular, you have two threads blocked trying
>> > to
>> > read from two different Handles and each will be awoken just when
>> > there's
>> > data to read, and the rest of the runtime will carry on even while
>> > they're
>> > blocked. Try it!
>>
>> Yeah, I know and I tried that.
>> As far as I can see, that's actually why things break with GHC.Event.
>> The Event system tries to register the Fd while it was registered by me
>> and encounters an EEXIST from epoll.
>>
>
> Ah, ok, so you can either do your epolling through the Haskell runtime or
> with your bare hands but you can't do both on a single FD.

Ah, I didn't to it with bare hands, I did it with GHC.Event registerFd.
Running my own epoll might work (according to the epoll man page),
but I really don't want to do that.

>> > If you're dealing with FDs that you've acquired from elsewhere, the
>> > function
>> > unix:System.Posix.IO.ByteString.fdToHandle can be used to import them
>> > and
>> > then they work like normal Handles in terms of blocking operations etc.
>> >
>> > Whenever I've had to deal with waking up for one of a number of reasons
>> > (not
>> > all of which are FDs) I've found the simplicity of STM is hard to beat.
>> > Something like:
>> >
>> > atomically ((Left <$> waitForFirstThing) <|> (Right <$>
>> > waitForSecondThing))
>>
>> Looks like I should look up STM. Does this scale easily?
>> I don't really need huge amounts, but I don't have any knowledge about the
>> number of Fds I will have.
>
>
> Waiting on arbitrarily many things is pretty much as simple (as long as they
> all have the same type so you can put them in a list):
>
> atomically (asum listOfWaitingThings)

Oh, I didn't see asum, but "came up" with the same implementation.

> In terms of code complexity that scales just fine! I'm afraid I've no real
> idea what the performance characteristics of such a device would be without
> trying it out in your use case. Whenever I've been doing this kind of thing
> I've always found myself IO-bound rather than CPU-bound so I've never found
> myself worrying too much about the efficiency of the code itself.
>
> If you're used to doing select/poll things yourself then it may help to
> think of Haskell threads blocking on Handles as basically a way to do an
> epoll-based event loop on the underlying FDs but with a much nicer syntax
> and less mucking around with explicit continuations. Similarly, if you're
> used to dealing with task scheduling at a low level then it may help to
> think of STM transactions blocking as a way to muck around with the run
> queues in the scheduler but with a much nicer syntax and less mucking around
> with explicit continuations.
>

For my current project the speed does not really matter, but I tend to do some
research anyway, since I might get to a point where I need it.

The one thing I am not sure about right now, is how to use threadWaitReadSTM.
Can I reuse the STM? I have two Fds I can test it with right now, and
one of them
works, the other one doesn't seem to work for me. I looked into the
source and to
me it looks like the STM should not be reused, since the content of
the TVar used
internally will be set to True.

Thanks for the help,

ongy
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users