RE: Project postmortem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Project postmortem

Simon Marlow
On 21 November 2005 10:16, Joel Reymont wrote:

> Is Wolfgang still around?
>
> Would you guys be willing to guide me through this? I could then
> possibly become the next Mac OSX expert :-).
>
> I have the disassembler dumps, etc. I do not know how to approach
> this problem. I read up a bit on the GHC internals, STG, code
> generation, etc.

If anyone is interested, this turned out to be a bug in the Network.BSD
module, namely that getHostByName isn't thread safe because it is based
on the C library function gethostbyname(), which returns data in a
single static area.

Workarounds are:

  - do your own mutual exclusion locking around getHostByName and any
    function that calls it (eg. connectTo).

  - use Network.Alt (http://www.cs.helsinki.fi/u/ekarttun/network-alt/),
    which has a thread-safe implementation of getHostByName.

  - wait for 6.4.2, which will contain a fix for this bug (we don't have
    a fix committed yet, Einar Karttunen has kindly offered to look into
it).

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Project postmortem

Tomasz Zielonka
On Fri, Dec 02, 2005 at 11:20:54AM -0000, Simon Marlow wrote:

> If anyone is interested, this turned out to be a bug in the Network.BSD
> module, namely that getHostByName isn't thread safe because it is based
> on the C library function gethostbyname(), which returns data in a
> single static area.
>
> Workarounds are:
>
>   - do your own mutual exclusion locking around getHostByName and any
>     function that calls it (eg. connectTo).
>
>   - use Network.Alt (http://www.cs.helsinki.fi/u/ekarttun/network-alt/),
>     which has a thread-safe implementation of getHostByName.
>
>   - wait for 6.4.2, which will contain a fix for this bug (we don't have
>     a fix committed yet, Einar Karttunen has kindly offered to look into
> it).

Do I understand correctly that another workaround is
    - don't compile your programs with -threaded
?

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Project postmortem

Joel Reymont
I thought that if -threaded is not used then all the blocking IO is  
assigned a separate OS thread.

On Dec 2, 2005, at 12:10 PM, Tomasz Zielonka wrote:

> Do I understand correctly that another workaround is
>     - don't compile your programs with -threaded
> ?


--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Project postmortem

Simon Marlow
In reply to this post by Simon Marlow
On 02 December 2005 12:11, Tomasz Zielonka wrote:

> On Fri, Dec 02, 2005 at 11:20:54AM -0000, Simon Marlow wrote:
>> If anyone is interested, this turned out to be a bug in the
>> Network.BSD module, namely that getHostByName isn't thread safe
>> because it is based on the C library function gethostbyname(), which
>> returns data in a single static area.
>>
>> Workarounds are:
>>
>>   - do your own mutual exclusion locking around getHostByName and any
>>     function that calls it (eg. connectTo).
>>
>>   - use Network.Alt
>>     (http://www.cs.helsinki.fi/u/ekarttun/network-alt/), which has a
>> thread-safe implementation of getHostByName.
>>
>>   - wait for 6.4.2, which will contain a fix for this bug (we don't
>>     have a fix committed yet, Einar Karttunen has kindly offered to
>> look into it).
>
> Do I understand correctly that another workaround is
>     - don't compile your programs with -threaded
> ?

No, the bug isn't related to -threaded.  It still occurs without
-threaded.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-bugs mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Project postmortem

Tomasz Zielonka
On Fri, Dec 02, 2005 at 12:39:25PM -0000, Simon Marlow wrote:
> > Do I understand correctly that another workaround is
> >     - don't compile your programs with -threaded
> > ?
>
> No, the bug isn't related to -threaded.  It still occurs without
> -threaded.

Let's check that now I understand - so the sequence

    call gethostbyname
    read the returned hostent

is written in Haskell, and many such sequences can be interleaved when
using Concurrent Haskell?

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Glasgow-haskell-bugs mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Reply | Threaded
Open this post in threaded view
|

RE: Project postmortem

Simon Marlow
In reply to this post by Simon Marlow
On 02 December 2005 12:49, Tomasz Zielonka wrote:

> On Fri, Dec 02, 2005 at 12:39:25PM -0000, Simon Marlow wrote:
>>> Do I understand correctly that another workaround is
>>>     - don't compile your programs with -threaded
>>> ?
>>
>> No, the bug isn't related to -threaded.  It still occurs without
>> -threaded.
>
> Let's check that now I understand - so the sequence
>
>     call gethostbyname
>     read the returned hostent
>
> is written in Haskell, and many such sequences can be interleaved when
> using Concurrent Haskell?

Yes, exactly.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-bugs mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Reply | Threaded
Open this post in threaded view
|

RE: Project postmortem

Simon Marlow
In reply to this post by Simon Marlow
On 02 December 2005 12:25, Joel Reymont wrote:

> I thought that if -threaded is not used then all the blocking IO is
> assigned a separate OS thread.

No - the runtime is completely single-threaded without -threaded.
Blocking I/O is managed by the runtime.  With -threaded, blocking I/O is
managed by a Haskell thread.  The programmer shouldn't see any
difference in the behaviour of I/O.

Is the documentation for -threaded lacking?  I realise it's a bit terse,
but do you have any concrete suggestions for improving it?

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-bugs mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Reply | Threaded
Open this post in threaded view
|

Threaded runtime (was Re: Project postmortem)

Joel Reymont
Simon,

On Dec 2, 2005, at 1:16 PM, Simon Marlow wrote:

> No - the runtime is completely single-threaded without -threaded.
> Blocking I/O is managed by the runtime.  With -threaded, blocking I/
> O is
> managed by a Haskell thread.  The programmer shouldn't see any
> difference in the behaviour of I/O.

I was going on this quote by Simon PJ:

--
It should be find to have lots of threads, esp if most of them are
asleep.  The only thing to watch out for is that GHC's runtime system
will consume one *OS* thread for each *blocked* foreign call.  So if you
have 10k threads each making a separate call to the OS to read from 10k
sockets, and they all block, you'll use 10k OS threads, and that will
probably fail.
--

Is this correct and if so how does it mesh with what you said above?

> Is the documentation for -threaded lacking?  I realise it's a bit  
> terse,
> but do you have any concrete suggestions for improving it?

Not at the moment but I'll think about it once I understand everything.
It could be worth summarizing every clarification in this thread.

        Thanks, Joel

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Examining the Haskell stack (read at your own risk ; -))

Joel Reymont
In reply to this post by Simon Marlow
Simon,

You told me a bit about how to examine the Haskell stack by looking  
at R22 on the PowerPC and $ebx on Intel architectures. I looked at  
your .gdbinit but could not figure out which macros are to be used.

The example below is a bit contrived in that I'm freeing the SSL  
context twice, on purpose. I tried getting a disassembler dump using  
the contents of R22 without luck:

(gdb) info registers r22
r22            0x137a3cc        20423628
(gdb) disas 0x137a3cc
No function contains specified address.

This is my stack trace:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000019
0x00568fe0 in sk_pop_free ()
(gdb) where
#0  0x00568fe0 in sk_pop_free ()
#1  0x0059f128 in X509_VERIFY_PARAM_free ()
#2  0x003c4d5c in SSL_free ()
#3  0x000c3254 in r7cH_info ()
#4  0x000ccd04 in schedule (mainThread=0x13f3f18,  
initialCapability=0x578df0) at Schedule.c:932
#5  0x000cdcac in waitThread_ (m=0x1100360, initialCapability=0x0) at  
Schedule.c:2156
#6  0x000cdb90 in scheduleWaitThread (tso=0x13c0000, ret=0x0,  
initialCapability=0x0) at Schedule.c:2050
#7  0x0001ff0c in rts_evalLazyIO (p=0x1ce0c8, ret=0x0) at RtsAPI.c:459
#8  0x0000495c in main (argc=25, argv=0x578df0) at Main.c:104
(gdb) info registers r22
r22            0x137a3cc        20423628
(gdb) disas 0x137a3cc
No function contains specified address.

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Examining the Haskell stack (read at your own risk ; -))

Simon Marlow
On 02 December 2005 14:03, Joel Reymont wrote:

> You told me a bit about how to examine the Haskell stack by looking
> at R22 on the PowerPC and $ebx on Intel architectures. I looked at
> your .gdbinit but could not figure out which macros are to be used.
>
> The example below is a bit contrived in that I'm freeing the SSL
> context twice, on purpose. I tried getting a disassembler dump using
> the contents of R22 without luck:
>
> (gdb) info registers r22
> r22            0x137a3cc        20423628
> (gdb) disas 0x137a3cc
> No function contains specified address.
>
> This is my stack trace:
>
> Program received signal EXC_BAD_ACCESS, Could not access memory.
> Reason: KERN_PROTECTION_FAILURE at address: 0x00000019
> 0x00568fe0 in sk_pop_free ()
> (gdb) where
> #0  0x00568fe0 in sk_pop_free ()
> #1  0x0059f128 in X509_VERIFY_PARAM_free ()
> #2  0x003c4d5c in SSL_free ()
> #3  0x000c3254 in r7cH_info ()
> #4  0x000ccd04 in schedule (mainThread=0x13f3f18,
> initialCapability=0x578df0) at Schedule.c:932
> #5  0x000cdcac in waitThread_ (m=0x1100360, initialCapability=0x0) at
> Schedule.c:2156
> #6  0x000cdb90 in scheduleWaitThread (tso=0x13c0000, ret=0x0,
> initialCapability=0x0) at Schedule.c:2050
> #7  0x0001ff0c in rts_evalLazyIO (p=0x1ce0c8, ret=0x0) at RtsAPI.c:459
> #8  0x0000495c in main (argc=25, argv=0x578df0) at Main.c:104
> (gdb) info registers r22
> r22            0x137a3cc        20423628
> (gdb) disas 0x137a3cc
> No function contains specified address.

It looks like your crash happened in the SSL library, and you have a
useful stack trace there.

r22 is a pointer to the stack, not a pointer to code, so you can't
disassemble it, you need to display memory (as I described in separate
mail).

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Threaded runtime (was Re: Project postmortem)

Simon Marlow
In reply to this post by Joel Reymont
On 02 December 2005 13:32, Joel Reymont wrote:

> I was going on this quote by Simon PJ:
>
> --
> It should be find to have lots of threads, esp if most of them are
> asleep.  The only thing to watch out for is that GHC's runtime system
> will consume one *OS* thread for each *blocked* foreign call.  So if
> you have 10k threads each making a separate call to the OS to read
> from 10k sockets, and they all block, you'll use 10k OS threads, and
> that will probably fail.
> --
>
> Is this correct and if so how does it mesh with what you said above?

It's correct, but not the whole story.  When you do a blocking I/O
operation, it is not implemented in terms of a blocking foreign call, so
it doesn't create an OS thread(*).  In -threaded mode, blocking I/O is
implemented by sending a reguest to the I/O manager thread, which
returns a response when I/O is available.  In non-threaded mode,
blocking I/O is implemented by returning to the runtime, which
occasionally checks for available I/O and wakes up the appropraite
threads.

Either way, as I said, the programmer doesn't see any difference.

(*) except on Windows, where everything is done differently and blocking
I/O currently gets a real OS thread.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Examining the Haskell stack (read at your own risk ; -))

Joel Reymont
In reply to this post by Simon Marlow

On Dec 2, 2005, at 2:08 PM, Simon Marlow wrote:

> It looks like your crash happened in the SSL library, and you have a
> useful stack trace there.

This is contrived in that I already know where the error is and it  
clearly points to SSL_free. I'm trying to figure out how I would have  
gotten to that call to getHostByName.

> r22 is a pointer to the stack, not a pointer to code, so you can't
> disassemble it, you need to display memory (as I described in separate
> mail).

Quoting you:

---
   gdb> p16 $r22

which prints 16 words of memory backwards (the way I like it) starting
at the addresss in $r22.  when displaying memory this way, gdb very
handily prints the symbol name for words that point into the program.
You can then pick things off the stack that look like return addresses
and disassemble them, if you want.
---

And p16 is defined in .gdbinit as:

define p16
pmem $arg0 16
end

Printing the 16 words gives me the printout below but where do I find  
my Haskell function? The code tha causes the crash looks like this:

maybeFreeSSL :: MaybeSSL -> IO ()
maybeFreeSSL tmv =
     do putStrLn $ "maybeFreeSSL invoked"
        mssl <- atomically $ swapTMVar tmv Nothing
        case mssl of
          Nothing -> return ()
          Just (ssl, _, _) -> do sslFree ssl
                                 sslFree ssl

Is there a way to have maybeFreeSSL in the trace? I compiled the app  
with -debug but the libraries and the above maybeFreeSSL code was  
compiled without it.

        Thanks, Joel

P.S.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000019
0x00568fe0 in sk_pop_free ()
(gdb) p16 $r22
0x137a40c:      0x1ce63c <stg_NO_TREC_closure>
0x137a408:      0x0
0x137a404:      0x0
0x137a400:      0x18fd0 <stg_TREC_HEADER_info>
0x137a3fc:      0xd6dbc <stg_stop_thread_info>
0x137a3f8:      0xd6f38 <stg_noforceIO_info>
0x137a3f4:      0x13ef148
0x137a3f0:      0x0
0x137a3ec:      0x2605c <stg_catch_frame_info>
0x137a3e8:      0x25c80 <stg_unblockAsyncExceptionszh_ret_info>
0x137a3e4:      0x13ef150
0x137a3e0:      0x27030 <s3BN_info>
0x137a3dc:      0x1d22cc <GHCziConc_lvl7_closure>
0x137a3d8:      0x26f6c <s3BK_info>
0x137a3d4:      0x13f38f4
0x137a3d0:      0x243f4 <s7Fy_info>
0x137a3cc:      0xc3270 <s7FI_info>

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Examining the Haskell stack (read at your own risk ; -))

Simon Marlow
In reply to this post by Joel Reymont
On 02 December 2005 14:17, Joel Reymont wrote:

> On Dec 2, 2005, at 2:08 PM, Simon Marlow wrote:
>
>> It looks like your crash happened in the SSL library, and you have a
>> useful stack trace there.
>
> This is contrived in that I already know where the error is and it
> clearly points to SSL_free. I'm trying to figure out how I would have
> gotten to that call to getHostByName.
>
>> r22 is a pointer to the stack, not a pointer to code, so you can't
>> disassemble it, you need to display memory (as I described in
>> separate mail).
>
> Quoting you:
>
> ---
>    gdb> p16 $r22
>
> which prints 16 words of memory backwards (the way I like it) starting
> at the addresss in $r22.  when displaying memory this way, gdb very
> handily prints the symbol name for words that point into the program.
> You can then pick things off the stack that look like return addresses
> and disassemble them, if you want.
> ---
>
> And p16 is defined in .gdbinit as:
>
> define p16
> pmem $arg0 16
> end
>
> Printing the 16 words gives me the printout below but where do I find
> my Haskell function? The code tha causes the crash looks like this:
>
> maybeFreeSSL :: MaybeSSL -> IO ()
> maybeFreeSSL tmv =
>      do putStrLn $ "maybeFreeSSL invoked"
>         mssl <- atomically $ swapTMVar tmv Nothing
>         case mssl of
>           Nothing -> return ()
>           Just (ssl, _, _) -> do sslFree ssl
>                                  sslFree ssl
>
> Is there a way to have maybeFreeSSL in the trace? I compiled the app
> with -debug but the libraries and the above maybeFreeSSL code was
> compiled without it.
>
> Thanks, Joel
>
> P.S.
>
> Program received signal EXC_BAD_ACCESS, Could not access memory.
> Reason: KERN_PROTECTION_FAILURE at address: 0x00000019
> 0x00568fe0 in sk_pop_free ()
> (gdb) p16 $r22
> 0x137a40c:      0x1ce63c <stg_NO_TREC_closure>
> 0x137a408:      0x0
> 0x137a404:      0x0
> 0x137a400:      0x18fd0 <stg_TREC_HEADER_info>
> 0x137a3fc:      0xd6dbc <stg_stop_thread_info>
> 0x137a3f8:      0xd6f38 <stg_noforceIO_info>
> 0x137a3f4:      0x13ef148
> 0x137a3f0:      0x0
> 0x137a3ec:      0x2605c <stg_catch_frame_info>
> 0x137a3e8:      0x25c80 <stg_unblockAsyncExceptionszh_ret_info>
> 0x137a3e4:      0x13ef150
> 0x137a3e0:      0x27030 <s3BN_info>
> 0x137a3dc:      0x1d22cc <GHCziConc_lvl7_closure>
> 0x137a3d8:      0x26f6c <s3BK_info>
> 0x137a3d4:      0x13f38f4
> 0x137a3d0:      0x243f4 <s7Fy_info>
> 0x137a3cc:      0xc3270 <s7FI_info>

Ok, you want a crash course in reading the Haskell stack.

Each xxx_info symbol is a return address.  The other values are the
contents of stack frames: values saved for use at the return address.

A return address also has an associated info table (use pinfo in the
.gdbinit I sent you for disaplying the info table, eg. pinfo s7FI_info).
Understanding info tables is beyond the scope of this message; please
see InfoTables.h in the GHC sources.

You probably want to know what each return address corresponds to:

  - the ones beginning "stg_" are in the runtime, you can see the code
for
    these in the GHC sources (grep in ghc/rts).

  - the ones that look like "s7FI_info" are local symbols.  You might be
    able to find which module it comes from by grepping the .o files in
    your program and the .a library files.  If you find it in a library,
    you can use 'nm' on the library to get a better idea of what
function
    the symbol is associated with.

    One problem is that because these are local symbols, you might find
    the same symbol in multiple places, and you have to make an educated
    guess as to which is the appropriate one (or use the disassembly to
    distinguish).

    To map this back to Haskell code, you need to recompile the original
    module and dump the intermediate code with -ddump-stg.
Understanding
    the output of -ddump-stg is beyond the scope of this message :-)

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Examining the Haskell stack (read at your own risk ; -))

Joel Reymont
Thank you Simon! This is very helpful and will take me a while to  
digest.

On Dec 2, 2005, at 2:43 PM, Simon Marlow wrote:

> Ok, you want a crash course in reading the Haskell stack.
>
> Each xxx_info symbol is a return address.  The other values are the
> contents of stack frames: values saved for use at the return address.

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe