Patch/feature proposal: Provide access to the runStmt sandbox ThreadID

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Patch/feature proposal: Provide access to the runStmt sandbox ThreadID

Edsko de Vries-4
Hi all,

This proposal is related to http://hackage.haskell.org/trac/ghc/ticket/1381,
which Simon Marlow closed through commit
https://github.com/ghc/ghc/commit/02c4ab049adeb77b8ee0e3b98fbf0f3026eee453.

The problem, in a nutshell, is "how do we terminate a code snippet started
with runStmt"? Before Simon's patch the only way was to disable ghc's
sandboxing, so that the snippet would run in the same thread as the thread
that called runStmt in the first place, and then send an asynchronous
exception to that thread. This is the approach we used to take. It's a
little tricky to get right (have to make sure that these exceptions are
thrown only at the right times), but we thought we had it working okay.

Until, that is, we realized we had a very nasty problem: snippets were
being unexpected interrupted. To debug this, we introduced a
CustomUserInterruption data type to serve as the exception that we were
throwing. This had two purposes: first, we would be use that if we saw a
CustomUserInterrupt that it could only have come from one particular throw,
and second, the CustomUserInterrupt carried an integer which was
incremented on every throw so that we never threw the same exception twice.

What we realized is that snippets were being interrupted by *old*
exceptions; that is, exceptions that we had thrown to *previous* snippets
(and had been caught too). This should obviously never happen. Ian managed
to reproduce this behaviour in a completely different setting (
http://hackage.haskell.org/trac/ghc/ticket/5902#comment:5) and we think
that something similar (unsafePerformIO related) must be happening inside
ghc.

Moreover, a further conjecture is that this only happens when runStmt is
still compiling the snippet to be executed (as opposed to the snippet
actually executing) -- i.e., that the exception gets stuck in the bowels of
ghc somewhere. We don't have any hard evidence for this, other than the
problem has not appeared again with the proposed patch (but that might be
luck, as it depends on timing).

The patch as we currently have it is against 7.4.2, so pre Simon's change
for the sandbox behaviour -- but I don't think that Simon's changes affect
the above problem. The core of our patch is

-sandboxIO :: DynFlags -> MVar Status -> IO [HValue] -> IO Status
-sandboxIO dflags statusMVar thing =
+sandboxIO :: DynFlags -> MVar Status -> MVar (Maybe ThreadId) -> IO
[HValue] -> IO Status
+sandboxIO dflags statusMVar tidMVar thing =
    mask $ \restore -> -- fork starts blocked
-     let runIt = liftM Complete $ try (restore $ rethrow dflags thing)
+     let thing' = gbracket (myThreadId >>= putMVar tidMVar . Just)
+                           (\() -> modifyMVar_ tidMVar (\_ -> return
Nothing))
+                           (\() -> thing)
+         runIt  = liftM Complete $ try (restore $ rethrow dflags thing')
      in if dopt Opt_GhciSandbox dflags
         then do tid <- forkIO $ do res <- runIt
                                    putMVar statusMVar res -- empty: can't
block

That is, sandboxIO takes an additional MVar (Maybe ThreadId):

   1. Initially this MVar should be empty. The MVar gets initialized to
   Just the Thread ID of the sandbox when the user code starts running. This
   means that if an secondary thread attempts to read the MVar (in order to
   kill the snippet), that secondary thread will block until the user code
   starts running -- it will not interrupt ghc when compiling the snippet.
   2. When the user code exists the MVar is updated to be Nothing. This
   means that if the auxiliary thread reads the MVar and finds Nothing it
   knows that the snippet has already terminated.
   3. When the auxiliary thread finds Just a thread ID (it must use
   withMVar rather than readMVar to avoid a race condition) it can safely
   throw the asynchronous exception.

The remainder of the patch just propagates these changes up so that runStmt
takes this MVar as an argument, too. Probably when integrating the patch
into ghc it would be better to leave runStmt along and provide a runStmt'
that takes the additional argument.

Again, any and all feedback on the above would be appreciated.

Edsko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/4021e42f/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

Patch/feature proposal: Provide access to the runStmt sandbox ThreadID

Simon Marlow-7
On 05/06/13 14:01, Edsko de Vries wrote:

> Hi all,
>
> This proposal is related to
> http://hackage.haskell.org/trac/ghc/ticket/1381, which Simon Marlow
> closed through commit
> https://github.com/ghc/ghc/commit/02c4ab049adeb77b8ee0e3b98fbf0f3026eee453.
>
> The problem, in a nutshell, is "how do we terminate a code snippet
> started with runStmt"? Before Simon's patch the only way was to disable
> ghc's sandboxing, so that the snippet would run in the same thread as
> the thread that called runStmt in the first place, and then send an
> asynchronous exception to that thread. This is the approach we used to
> take. It's a little tricky to get right (have to make sure that these
> exceptions are thrown only at the right times), but we thought we had it
> working okay.
>
> Until, that is, we realized we had a very nasty problem: snippets were
> being unexpected interrupted. To debug this, we introduced a
> CustomUserInterruption data type to serve as the exception that we were
> throwing. This had two purposes: first, we would be use that if we saw a
> CustomUserInterrupt that it could only have come from one particular
> throw, and second, the CustomUserInterrupt carried an integer which was
> incremented on every throw so that we never threw the same exception twice.

Yup, it's a nasty problem indeed.

> What we realized is that snippets were being interrupted by *old*
> exceptions; that is, exceptions that we had thrown to *previous*
> snippets (and had been caught too). This should obviously never happen.
> Ian managed to reproduce this behaviour in a completely different
> setting (http://hackage.haskell.org/trac/ghc/ticket/5902#comment:5) and
> we think that something similar (unsafePerformIO related) must be
> happening inside ghc.
>
> Moreover, a further conjecture is that this only happens when runStmt is
> still compiling the snippet to be executed (as opposed to the snippet
> actually executing) -- i.e., that the exception gets stuck in the bowels
> of ghc somewhere. We don't have any hard evidence for this, other than
> the problem has not appeared again with the proposed patch (but that
> might be luck, as it depends on timing).

I suspect that there may be thunks inside the compiled expression
containing unsafePerformIOs left over by the typechecker (I'd be
interested to know how this happens, though).

> The patch as we currently have it is against 7.4.2, so pre Simon's
> change for the sandbox behaviour -- but I don't think that Simon's
> changes affect the above problem. The core of our patch is
>
> -sandboxIO :: DynFlags -> MVar Status -> IO [HValue] -> IO Status
> -sandboxIO dflags statusMVar thing =
> +sandboxIO :: DynFlags -> MVar Status -> MVar (Maybe ThreadId) -> IO
> [HValue] -> IO Status
> +sandboxIO dflags statusMVar tidMVar thing =
>      mask $ \restore -> -- fork starts blocked
> -     let runIt = liftM Complete $ try (restore $ rethrow dflags thing)
> +     let thing' = gbracket (myThreadId >>= putMVar tidMVar . Just)
> +                           (\() -> modifyMVar_ tidMVar (\_ -> return
> Nothing))
> +                           (\() -> thing)
> +         runIt  = liftM Complete $ try (restore $ rethrow dflags thing')
>        in if dopt Opt_GhciSandbox dflags
>           then do tid <- forkIO $ do res <- runIt
>                                      putMVar statusMVar res -- empty:
> can't block
>
> That is, sandboxIO takes an additional MVar (Maybe ThreadId):
>
>  1. Initially this MVar should be empty. The MVar gets initialized to
>     Just the Thread ID of the sandbox when the user code starts running.
>     This means that if an secondary thread attempts to read the MVar (in
>     order to kill the snippet), that secondary thread will block until
>     the user code starts running -- it will not interrupt ghc when
>     compiling the snippet.
>  2. When the user code exists the MVar is updated to be Nothing. This
>     means that if the auxiliary thread reads the MVar and finds Nothing
>     it knows that the snippet has already terminated.
>  3. When the auxiliary thread finds Just a thread ID (it must use
>     withMVar rather than readMVar to avoid a race condition) it can
>     safely throw the asynchronous exception.

I'm not sure this will work, because I'm not sure there is a defined
boundary between "compiling" and "running", i.e. there might be thunks
hiding inside that contain the exception.

I believe the right way to fix this is to find all the places in the
compiler that catch and rethrow exceptions inside unsafePerformIO, and
make them use throwTo when the exception is asynchronous.  I've done
this in the IO library and it works - see Note [async] in
GHC.IO.Handle.Internals in the base package.  I don't think there will
be many places that need to change - the main one looks to be the tryM
in forkM_maybe in TcRnMonad.  The main thing to worry about is what
happens when the throwTo *returns*, because that's what it does when the
thunk is poked on again after receiving the async exception the first time.

Cheers,
        Simon


> The remainder of the patch just propagates these changes up so that
> runStmt takes this MVar as an argument, too. Probably when integrating
> the patch into ghc it would be better to leave runStmt along and provide
> a runStmt' that takes the additional argument.
>
> Again, any and all feedback on the above would be appreciated.
>
> Edsko
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>



Reply | Threaded
Open this post in threaded view
|

Patch/feature proposal: Provide access to the runStmt sandbox ThreadID

Edsko de Vries-4
Hi,

I believe the right way to fix this is to find all the places in the
> compiler that catch and rethrow exceptions inside unsafePerformIO, and make
> them use throwTo when the exception is asynchronous.


Ok, I have looked at this in detail and I'm not sure how feasible this is.
I have filed a bug, with a test case, and a detailed explanation of my
investigation so far: http://hackage.haskell.org/trac/ghc/ticket/8006.

Edsko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130621/af56d92f/attachment-0001.htm>