|
Folks,
I have done a lot of experiments over the past few weeks and came to a few interesting conclusions. First some background, then issues, solutions and conclusions. I wrote a test harness for a poker server that understands the different binary packets and can send and receive them. The harness launches each "script" in a separate unbound thread that connects to the server via TCP and does its work. The main goals of the project were: easy scripting, very high number of connections from the harness (a few thousand) and running on Windows. I develop on Mac OSX but have a Windows machine for testing and to run the poker server. Another key goal was to support the server encryption. SSL encryption is done in a wierd way that requires attaching read/write OpenSSL BIOs to the SSL descriptor so that SSL encrypts to/from memory. Encrypted chunks are then taken from the BIOs and sent as payload in servver packets. Overall, I probably spent about 4 weeks writing the server and about 2 more weeks grappling with the various issues. The issues centered around 1) the program trashing memory like no tomorrow, 2) intermittent crashes on Windows and 3) not being able to launch a high number of connections on Windows before crashing. I significantly improved trashing of memory by switching to plain Haskell structures from nested lists of wxHaskell-style properties (attr := value). Intermittent crashes were harder to troubleshoot, specially given that things were running smoothly on Mac OSX. Stack traces pointed into libcrypto (part of OpenSSL) and thus to the BIOs that I was allocating. I guesses that OpenSSL was maxing out some resources and closed the leak by explicitly freeing the SSL descriptor which freed the associated BIO structures. Then things got wierder as my program started crashing in a different place entirely with stack traces like this: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x3139322e 0x0027c174 in s8j1_info () (gdb) where #0 0x0027c174 in s8j1_info () #1 0x0021c9f4 in StgRunIsImplementedInAssembler () at StgCRun.c:576 #2 0x0021cdc4 in schedule (mainThread=0x1100360, initialCapability=0x308548) at Schedule.c:932 #3 0x0021dd6c in waitThread_ (m=0x1100360, initialCapability=0x0) at Schedule.c:2156 #4 0x0021dc50 in scheduleWaitThread (tso=0x13c0000, ret=0x0, initialCapability=0x0) at Schedule.c:2050 #5 0x00219548 in rts_evalLazyIO (p=0x29b47c, ret=0x0) at RtsAPI.c:459 #6 0x001e4768 in main (argc=2262116, argv=0x308548) at Main.c:104 I took waitThread_ as a clue and started digging deeper. Whenever I connect to the server or send a command I wait for X seconds and if not connected or desired command is not received I throw an exception which fails the script. I implemented the timeout combinator a couple of different ways, including that in the Asynchronous Exceptions paper but it did not help. I think the issue has to do with killing threads that are using FFI. Although I'm killing threads that call the Haskell connectTo, hGetBuf, etc. I think it's still FFI. I disposed of timeouts entirely, leaving connectTo as it is and using hWaitForInput on my socket handle to simulate timeouts. This improved things tremendously and I'm now able to run a few thousands of unbound script threads on Windows with OpenSSL FFI and everything. Memory usage is still higher than I would have liked and crashes in OpenSSL still happen when the number of threads/memory usage is really high so there's still room for improvement. I should probably go back to using a foreign finalizer (SSL_free) on the SSL descriptors rather than freeing them explicitly as the freeing does not happen if a script fails mid-way. I'm quite satisfied with my first Haskell project. I love Haskell and will continue hacking away with it. This list is invaluable in the depth of offered help whereas #haskell (IRC) is invaluable when speed matters. I'm quite amazed at the things I have been able to do, the expressiveness of Haskell and the clean looks. Clean looks can be deceptive, though, as they can hide code of amazing complexity. Fundeps, existential types, HList take a while to grasp. Also, I feel somewhat like a pioneer and I definitely got more than a fair share of arrows in my back. I had GHC run out of memory during compilation (fixed by SPJ), had it quit midway during compilation with an error about generated extents being too large in assembler code. I had GHC crash at runtime with an error like "fromJust not returning Just, this could not be happening!". Yesterday's error topped them all: internal error: update_fwd: unknown/strange object 0 Please report this as a bug to [hidden email], or http://www.sourceforge.net/projects/ghc/ I think I got this when using +RTS -C0 -c. Overall, the experience with Haskell has been exhilarating and I'm already preparing to use it on my next projects like detecting collusion in poker as well as rake optimization (Dazzle paper very helpful here!). Still, I think that GHC can be a bit rough around the edges and I would think twice about writing high-performance network apps with it. Thanks, Joel P.S. The Glasgow Distributed Haskell (GdH) people are supposed to have a mailing list and I would love to share my findings twith them but I could not find the mailing list itself. -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Hi Joel,
What would your impression be of building an application in Haskell versus Erlang from a practical point of view given your experiences with this project and the Erlang poker server? My feelings having developed a little with Erlang and embarking on a Haskell project are that the learning curve is far steeper with Haskell but it is far more elegant and readable. I'm still climbing that curve though (IO makes me want to pull my hair out). Thanks for writing up that post mortem. There's lots of good info in there, especially for a newbie like myself. Cheers, Scott On 18/11/2005, at 12:43 AM, Joel Reymont wrote: > Folks, > > I have done a lot of experiments over the past few weeks and came > to a few interesting conclusions. First some background, then > issues, solutions and conclusions. > > I wrote a test harness for a poker server that understands the > different binary packets and can send and receive them. The harness > launches each "script" in a separate unbound thread that connects > to the server via TCP and does its work. > > The main goals of the project were: easy scripting, very high > number of connections from the harness (a few thousand) and running > on Windows. I develop on Mac OSX but have a Windows machine for > testing and to run the poker server. > > Another key goal was to support the server encryption. SSL > encryption is done in a wierd way that requires attaching read/ > write OpenSSL BIOs to the SSL descriptor so that SSL encrypts to/ > from memory. Encrypted chunks are then taken from the BIOs and sent > as payload in servver packets. > > Overall, I probably spent about 4 weeks writing the server and > about 2 more weeks grappling with the various issues. The issues > centered around 1) the program trashing memory like no tomorrow, 2) > intermittent crashes on Windows and 3) not being able to launch a > high number of connections on Windows before crashing. > > I significantly improved trashing of memory by switching to plain > Haskell structures from nested lists of wxHaskell-style properties > (attr := value). Intermittent crashes were harder to troubleshoot, > specially given that things were running smoothly on Mac OSX. > > Stack traces pointed into libcrypto (part of OpenSSL) and thus to > the BIOs that I was allocating. I guesses that OpenSSL was maxing > out some resources and closed the leak by explicitly freeing the > SSL descriptor which freed the associated BIO structures. Then > things got wierder as my program started crashing in a different > place entirely with stack traces like this: > > Program received signal EXC_BAD_ACCESS, Could not access memory. > Reason: KERN_INVALID_ADDRESS at address: 0x3139322e > 0x0027c174 in s8j1_info () > (gdb) where > #0 0x0027c174 in s8j1_info () > #1 0x0021c9f4 in StgRunIsImplementedInAssembler () at StgCRun.c:576 > #2 0x0021cdc4 in schedule (mainThread=0x1100360, > initialCapability=0x308548) at Schedule.c:932 > #3 0x0021dd6c in waitThread_ (m=0x1100360, initialCapability=0x0) > at Schedule.c:2156 > #4 0x0021dc50 in scheduleWaitThread (tso=0x13c0000, ret=0x0, > initialCapability=0x0) at Schedule.c:2050 > #5 0x00219548 in rts_evalLazyIO (p=0x29b47c, ret=0x0) at RtsAPI.c:459 > #6 0x001e4768 in main (argc=2262116, argv=0x308548) at Main.c:104 > > I took waitThread_ as a clue and started digging deeper. > > Whenever I connect to the server or send a command I wait for X > seconds and if not connected or desired command is not received I > throw an exception which fails the script. I implemented the > timeout combinator a couple of different ways, including that in > the Asynchronous Exceptions paper but it did not help. I think the > issue has to do with killing threads that are using FFI. Although > I'm killing threads that call the Haskell connectTo, hGetBuf, etc. > I think it's still FFI. > > I disposed of timeouts entirely, leaving connectTo as it is and > using hWaitForInput on my socket handle to simulate timeouts. This > improved things tremendously and I'm now able to run a few > thousands of unbound script threads on Windows with OpenSSL FFI and > everything. > > Memory usage is still higher than I would have liked and crashes in > OpenSSL still happen when the number of threads/memory usage is > really high so there's still room for improvement. I should > probably go back to using a foreign finalizer (SSL_free) on the SSL > descriptors rather than freeing them explicitly as the freeing does > not happen if a script fails mid-way. > > I'm quite satisfied with my first Haskell project. I love Haskell > and will continue hacking away with it. This list is invaluable in > the depth of offered help whereas #haskell (IRC) is invaluable when > speed matters. I'm quite amazed at the things I have been able to > do, the expressiveness of Haskell and the clean looks. > > Clean looks can be deceptive, though, as they can hide code of > amazing complexity. Fundeps, existential types, HList take a while > to grasp. Also, I feel somewhat like a pioneer and I definitely got > more than a fair share of arrows in my back. > > I had GHC run out of memory during compilation (fixed by SPJ), had > it quit midway during compilation with an error about generated > extents being too large in assembler code. I had GHC crash at > runtime with an error like "fromJust not returning Just, this could > not be happening!". Yesterday's error topped them all: > > internal error: update_fwd: unknown/strange object 0 > Please report this as a bug to [hidden email], > or http://www.sourceforge.net/projects/ghc/ > > I think I got this when using +RTS -C0 -c. > > Overall, the experience with Haskell has been exhilarating and I'm > already preparing to use it on my next projects like detecting > collusion in poker as well as rake optimization (Dazzle paper very > helpful here!). Still, I think that GHC can be a bit rough around > the edges and I would think twice about writing high-performance > network apps with it. > > Thanks, Joel > > P.S. The Glasgow Distributed Haskell (GdH) people are supposed to > have a mailing list and I would love to share my findings twith > them but I could not find the mailing list itself. > > -- > http://wagerlabs.com/ > > > > > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
On Nov 17, 2005, at 10:59 PM, Scotty Weeks wrote:
> What would your impression be of building an application in Haskell > versus Erlang from a practical point of view given your experiences > with this project and the Erlang poker server? I would have been done much faster and with far less trouble. The scripting would have been a royal pain in the rear for the customer, though. But, again, I would have been done much faster as network clients/servers is what Erlang excels at. That and concurrency. Haskell... I'm still trying to figure out why reading from a Chan with getChanContents and then printing out the contents works and doing the same with readChan and looping blocks. Or why the app now crashes violently on Mac OSX but works without a hitch on Windows. And I still don't have a good timeout combinator. I felt very excited this morning given the newly found love between my app and Windows but the excitement lasted only until I realized that hWaitForIO blocks all other threads :-(. > My feelings having developed a little with Erlang and embarking on > a Haskell project are that the learning curve is far steeper with > Haskell but it is far more elegant and readable. I'm still climbing > that curve though (IO makes me want to pull my hair out). Unless lightning strikes and tomorrow morning I figure out what's the deal with the spurious Mac OSX crashes, I think this might be my last network app in Haskell. I should really be spending time on the business end of the app intead of figuring out platform differences and the like. Joel -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
| Unless lightning strikes and tomorrow morning I figure out what's the | deal with the spurious Mac OSX crashes, I think this might be my last | network app in Haskell. I should really be spending time on the | business end of the app intead of figuring out platform differences | and the like. Joel, I think it's fantastic that you've been pushing on Haskell in the way you have. What I learn from your experience is that the *language* is pretty good for what you wanted to do (esp lightweight concurrency) but the *libraries* in the area of networking are lacking both functionality and (more particularly) robustness. I hope you don't abandon Haskell altogether. Without steady, friendly pressure from applications-end folk like you, things won't improve. It's incredibly valuable feedback. But I can see that when you have to deliver something next week you can't wait around for some someone to get around to fixing your problem. (They aren't paid either!) Maybe you can use Haskell for something less mission-critical, so that you can keep up the pressure? Meanwhile, let me utter my customary encouragement to the Haskell community out there: please pitch in and help! Haskell will only break into real applications, of the kind Joel has been writing, if we can offer robust libraries, and that depends utterly on you. Don't wait for someone else to do it. Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Hi,
so sad, so true... At least haskell ideas sneak into mainstream languages under disguise (LINQ anyone?). C-Java-C# syntax that business "developers" and their bosses love so much is mandatory so the result lack the beauty we all know and appreciate, but it is kinda nice to see functional programming going mainstream at last. Maybe, "Lambda" is the IT buzzword of next decade :-). Jan -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Peyton-Jones Sent: 18 November 2005 10:17 To: Joel Reymont; Scotty Weeks Cc: Haskell Cafe Subject: RE: [Haskell-cafe] Project postmortem | Unless lightning strikes and tomorrow morning I figure out what's the | deal with the spurious Mac OSX crashes, I think this might be my last | network app in Haskell. I should really be spending time on the | business end of the app intead of figuring out platform differences | and the like. Joel, I think it's fantastic that you've been pushing on Haskell in the way you have. What I learn from your experience is that the *language* is pretty good for what you wanted to do (esp lightweight concurrency) but the *libraries* in the area of networking are lacking both functionality and (more particularly) robustness. I hope you don't abandon Haskell altogether. Without steady, friendly pressure from applications-end folk like you, things won't improve. It's incredibly valuable feedback. But I can see that when you have to deliver something next week you can't wait around for some someone to get around to fixing your problem. (They aren't paid either!) Maybe you can use Haskell for something less mission-critical, so that you can keep up the pressure? Meanwhile, let me utter my customary encouragement to the Haskell community out there: please pitch in and help! Haskell will only break into real applications, of the kind Joel has been writing, if we can offer robust libraries, and that depends utterly on you. Don't wait for someone else to do it. Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Peyton-Jones
On Nov 18, 2005, at 10:17 AM, Simon Peyton-Jones wrote:
> I hope you don't abandon Haskell altogether. Without steady, friendly > pressure from applications-end folk like you, things won't improve. Nah, I'm just having a very frustrating Friday. I think I need some direction in which to dig and a bit of patience over the weekend. For example, What does this mean precisely? My take is that the GHC runtime is trying to call a C function. this much I gathered from the source code. It also seems that since I do not see another library at #0 then the issue is within GHC. Is that the right take on it? Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x3139322e 0x0027c174 in s8j1_info () (gdb) where #0 0x0027c174 in s8j1_info () #1 0x0021c9f4 in StgRunIsImplementedInAssembler () at StgCRun.c:576 #2 0x0021cdc4 in schedule (mainThread=0x1100360, initialCapability=0x308548) at Schedule.c:932 #3 0x0021dd6c in waitThread_ (m=0x1100360, initialCapability=0x0) at Schedule.c:2156 #4 0x0021dc50 in scheduleWaitThread (tso=0x13c0000, ret=0x0, initialCapability=0x0) at Schedule.c:2050 #5 0x00219548 in rts_evalLazyIO (p=0x29b47c, ret=0x0) at RtsAPI.c:459 #6 0x001e4768 in main (argc=2262116, argv=0x308548) at Main.c:104 > It's incredibly valuable feedback. But I can see that when you > have to > deliver something next week you can't wait around for some someone to > get around to fixing your problem. (They aren't paid either!) Maybe > you can use Haskell for something less mission-critical, so that > you can > keep up the pressure? I can't change who I am, I just gotta push the envelope. I would not have stood the pain of doing this project in Erlang, for example, what with all the nested data structures, etc. I'm not waiting for someone to fix my problem, I would gladly fix it myself if I understood where the problem is. It used to be fairly clear before when the stack trace pointed to one of the OpenSSL libraries. In this particular case I don't even know how to start debugging this. Do I set a break point in s8j1_info? But it's something else periodically, like s34n_info. Do I inspect the C code somehow? But how do I do that? How do I debug the GHC runtime? Thanks, Joel -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Jan Stoklasa (gmail)
This would be a good new thread to discuss it ;-)
On Nov 18, 2005, at 10:42 AM, Jan Stoklasa (gmail) wrote: > Hi, > so sad, so true... > At least haskell ideas sneak into mainstream languages under > disguise (LINQ > anyone?). C-Java-C# syntax that business "developers" and their > bosses love > so much is mandatory so the result lack the beauty we all know and > appreciate, but it is kinda nice to see functional programming going > mainstream at last. Maybe, "Lambda" is the IT buzzword of next > decade :-). -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
On 18 November 2005 10:48, Joel Reymont wrote:
> On Nov 18, 2005, at 10:17 AM, Simon Peyton-Jones wrote: > >> I hope you don't abandon Haskell altogether. Without steady, >> friendly pressure from applications-end folk like you, things won't >> improve. > > Nah, I'm just having a very frustrating Friday. I think I need some > direction in which to dig and a bit of patience over the weekend. For > example, > > What does this mean precisely? My take is that the GHC runtime is > trying to call a C function. this much I gathered from the source > code. It also seems that since I do not see another library at #0 > then the issue is within GHC. Is that the right take on it? The stack trace doesn't mean much at all I'm afraid - GHC doesn't use the C stack, so any stack trace generated for a crash inside the Haskell code is mostly useless. It does tell you the block in which the crash happened (s8j1_info), and it tells you that the crash was in Haskell and not C. The rest of the frames on the stack are from the GHC runtime, and you'll pretty much always see these same frames on the stack for any crash inside Haskell code. How we normally proceed for a crash like this is as follows: examine where the crash happened and determine whether it is a result of heap or stack corruption, and then attempt to trace backwards to find out where the corruption originated from. Tracing backwards means running the program from the beginning again, so it's essential to have a reproducible example. Without reproducibility, we have to use a combination of debugging printfs and staring really hard at the code, which is much more time consuming (and still requires being able to run the program to make it crash with debugging output turned on). You can get debugging output by compiling your program with -debug, and then running it with some of the -D<something> options (use +RTS -? for a list, +RTS -Ds is a good one to start with). Cheers, Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote:
> You can get debugging output by compiling your program with -debug, > and > then running it with some of the -D<something> options (use +RTS -? > for > a list, +RTS -Ds is a good one to start with). I'm still working on a repro case but here's what I get... +RTS -Ds ... scheduler: checking for threads blocked on I/O sched: -->> running thread 1103 ThreadRunGHC ... sched: --<< thread 1103 (ThreadRunGHC) stopped: is blocked on an MVar all threads: thread 1225 @ 0x1539000 is not blocked thread 1224 @ 0x1506aa4 is not blocked thread 1223 @ 0x15066a4 is not blocked ... scheduler: checking for threads blocked on I/O sched: -->> running thread 1107 ThreadRunGHC ... Segmentation fault 1107 is not blocked in the list of all threads. What options should I try next? Thanks, Joel -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
On 18 November 2005 14:42, Joel Reymont wrote:
> On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote: > >> You can get debugging output by compiling your program with -debug, >> and then running it with some of the -D<something> options (use +RTS >> -? for a list, +RTS -Ds is a good one to start with). > > I'm still working on a repro case but here's what I get... > > +RTS -Ds > ... > scheduler: checking for threads blocked on I/O > sched: -->> running thread 1103 ThreadRunGHC ... > sched: --<< thread 1103 (ThreadRunGHC) stopped: is blocked on an MVar > all threads: > thread 1225 @ 0x1539000 is not blocked > thread 1224 @ 0x1506aa4 is not blocked > thread 1223 @ 0x15066a4 is not blocked > ... > scheduler: checking for threads blocked on I/O > sched: -->> running thread 1107 ThreadRunGHC ... > Segmentation fault > > 1107 is not blocked in the list of all threads. What options should I > try next? That doesn't tell us much unfortunately. Can you send a disassembly of the block in which the crash happened? Is it always the same block, BTW? Does changing the heap size (+RTS -H<size>) have any effect? Cheers, Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
On Nov 18, 2005, at 2:47 PM, Simon Marlow wrote: > That doesn't tell us much unfortunately. Can you send a > disassembly of > the block in which the crash happened? > > Is it always the same block, BTW? Does changing the heap size (+RTS > -H<size>) have any effect? I don't think changing the heap size has any effect. I tried a run with -H512m and the only difference was that it crashed at 0x00000005 with the same kernel protection failure. The address for s34n_info is the same, everything else the same, including stack trace and addresses and offsets in it. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x0024ef88 in s34n_info () (gdb) where #0 0x0024ef88 in s34n_info () #1 0x00211eb4 in StgRunIsImplementedInAssembler () at StgCRun.c:576 #2 0x0020f048 in schedule (mainThread=0x1100360, initialCapability=0x2fd508) at Schedule.c:932 #3 0x0020fff0 in waitThread_ (m=0x1100360, initialCapability=0x0) at Schedule.c:2156 #4 0x0020fed4 in scheduleWaitThread (tso=0x13c0000, ret=0x0, initialCapability=0x0) at Schedule.c:2050 #5 0x0020cd70 in rts_evalLazyIO (p=0x29216c, ret=0x0) at RtsAPI.c:459 #6 0x001d80fc in main (argc=2212180, argv=0x2fd508) at Main.c:104 (gdb) disas 0x0024ef88 Dump of assembler code for function s34n_info: 0x0024ef70 <s34n_info+0>: mr r10,r25 0x0024ef74 <s34n_info+4>: addi r9,r25,8 0x0024ef78 <s34n_info+8>: mr r25,r9 0x0024ef7c <s34n_info+12>: cmplw cr7,r9,r26 0x0024ef80 <s34n_info+16>: bgt- cr7,0x24efb4 <s34n_info+68> 0x0024ef84 <s34n_info+20>: lwz r2,4(r14) 0x0024ef88 <s34n_info+24>: lbzx r0,r2,r15 0x0024ef8c <s34n_info+28>: cmpwi cr7,r0,0 0x0024ef90 <s34n_info+32>: bne- cr7,0x24efc4 <s34n_info+84> 0x0024ef94 <s34n_info+36>: lis r2,42 0x0024ef98 <s34n_info+40>: lwz r2,20668(r2) 0x0024ef9c <s34n_info+44>: stw r2,4(r10) 0x0024efa0 <s34n_info+48>: stw r15,0(r9) 0x0024efa4 <s34n_info+52>: addi r14,r9,-4 0x0024efa8 <s34n_info+56>: lwz r29,0(r22) 0x0024efac <s34n_info+60>: mtctr r29 0x0024efb0 <s34n_info+64>: bctr 0x0024efb4 <s34n_info+68>: li r0,8 0x0024efb8 <s34n_info+72>: stw r0,108(r27) 0x0024efbc <s34n_info+76>: lwz r29,-4(r27) 0x0024efc0 <s34n_info+80>: b 0x24efac <s34n_info+60> 0x0024efc4 <s34n_info+84>: addi r15,r15,1 0x0024efc8 <s34n_info+88>: addi r25,r9,-8 0x0024efcc <s34n_info+92>: lis r29,37 0x0024efd0 <s34n_info+96>: addi r29,r29,-4240 0x0024efd4 <s34n_info+100>: b 0x24efac <s34n_info+60> 0x0024efd8 <s34n_info+104>: .long 0x21 0x0024efdc <s34n_info+108>: .long 0x240000 End of assembler dump. -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Marlow
Folks,
This is not quite the error that I was expecting but they could be related, I'm not sure. In any case, you can retrieve the repro project thusly: darcs get http://test.wagerlabs.com/postmortem You need OpenSSL to build these so don't forget to add -lssl -lcrypto to either ghc or ghci. I would appreciate if we could all collectively look at this as things are either wierd or I'm missing something obvious. I will apply any patches sent to me. I run like this: ghci -fglasgow-exts -lssl -lcrypto :l Server main ghci -fglasgow-exts -lssl -lcrypto :l Client main I get in the server window: interactive: unknown exception 14:51:39: ThreadId 1: Accepted new connection: {handle: <socket: 5>} 14:51:39: ThreadId 1: Verify locations: 1 14:51:39: ThreadId 1: sslGetError: 2 14:51:39: ThreadId 4: Starting SSL handshake... 14:51:39: ThreadId 4: Reading from BIO... 14:51:39: ThreadId 4: Waiting for BIO 0x01108670 14:51:39: ThreadId 4: waitForBio: gotta wait a bit... If you look at SSL.hs you will see that I'm calling threadDelay right after this message. No other messages are produced. This tells me that threadDelay is throwing an exception. Why would it, though? And how can I tell what the exception is? If I comment out the threadDelay then I get the exception somewhere in the expect code after bytes are sent to the other side. Overall, my intent is to get this to work for 1 thread and then try, say, 5 or 10 thousand. Thanks, Joel _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Peyton-Jones
On Nov 18, 2005, at 2:17 AM, Simon Peyton-Jones wrote: > I hope you don't abandon Haskell altogether. Without steady, friendly > pressure from applications-end folk like you, things won't improve. > It's incredibly valuable feedback. But I can see that when you > have to > deliver something next week you can't wait around for some someone to > get around to fixing your problem. (They aren't paid either!) Maybe > you can use Haskell for something less mission-critical, so that > you can > keep up the pressure? Here is some feedback on a negative experience I had with Haskell recently (really about the only negative experience :) I was playing with one of the Haskell OpenGL libraries (actually it's a refined FFI) over the summer and some things about it rubbed me the wrong way. I wanted to try fixing them but I really couldn't figure out how to get ahold of the code and start hacking. I found some candidates, but it seemed like old cvs repositories or something. I was confused, ran out of time and moved on. Why do I bring it up? If it had been obvious where to get an official copy of the library I could have tried sending in some patches to make things work the way I wanted. I'm a huge fan of darcs repositories, BTW. Thanks, Jason _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
test.wagerlabs.com seems really slow for me right now. I've mirrored
the repo on my own machine (might not be 100% reliable, but should stay up nearly all of the time). The mirror address is http://vx.hn.org/postmortem/ - Cale On 18/11/05, Joel Reymont <[hidden email]> wrote: > Folks, > > This is not quite the error that I was expecting but they could be > related, I'm not sure. In any case, you can retrieve the repro > project thusly: > > darcs get http://test.wagerlabs.com/postmortem > > You need OpenSSL to build these so don't forget to add -lssl -lcrypto > to either ghc or ghci. > > I would appreciate if we could all collectively look at this as > things are either wierd or I'm missing something obvious. I will > apply any patches sent to me. > > I run like this: > > ghci -fglasgow-exts -lssl -lcrypto > :l Server > main > > ghci -fglasgow-exts -lssl -lcrypto > :l Client > main > > I get in the server window: > > interactive: unknown exception > > 14:51:39: ThreadId 1: Accepted new connection: {handle: <socket: 5>} > 14:51:39: ThreadId 1: Verify locations: 1 > 14:51:39: ThreadId 1: sslGetError: 2 > 14:51:39: ThreadId 4: Starting SSL handshake... > 14:51:39: ThreadId 4: Reading from BIO... > 14:51:39: ThreadId 4: Waiting for BIO 0x01108670 > 14:51:39: ThreadId 4: waitForBio: gotta wait a bit... > > If you look at SSL.hs you will see that I'm calling threadDelay right > after this message. No other messages are produced. This tells me > that threadDelay is throwing an exception. > > Why would it, though? And how can I tell what the exception is? If I > comment out the threadDelay then I get the exception somewhere in the > expect code after bytes are sent to the other side. > > Overall, my intent is to get this to work for 1 thread and then try, > say, 5 or 10 thousand. > > Thanks, Joel > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Jason Dagit
Am Freitag, 18. November 2005 17:16 schrieb Jason Dagit:
> [...] > I was playing with one of the Haskell OpenGL libraries (actually it's > a refined FFI) over the summer and some things about it rubbed me the > wrong way. I wanted to try fixing them but I really couldn't figure > out how to get ahold of the code and start hacking. I found some > candidates, but it seemed like old cvs repositories or something. I > was confused, ran out of time and moved on. Why do I bring it up? > If it had been obvious where to get an official copy of the library I > could have tried sending in some patches to make things work the way > I wanted. I'm a huge fan of darcs repositories, BTW. Hmmm, as the OpenGL/GLUT/OpenAL/ALUT guy I have to admit that I should really, really update the web pages about those packages. But anyway: Asking on any Haskell mailing list (there is even one especially for the OpenGL/GLUT packages) normally gives you fast response times. Without even knowing that there is a problem, there is nothing I can fix. :-) And don't hesitate to ask questions about the usage of those packages, because this is valuable feedback, too. Regarding the repository: The normal fptools repository is the "official" one for those packages. But IIRC, most GHC binary packages include OpenGL/GLUT support, so there is normally no urgent need for a home-made version. All packages are already cabalized, but I have to admit that I have never tried to build them on their own. Cheers, S. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
The exception is actually from withTimeOut. Removing calls to that
lets the handshake proceed. The server is using a client handshake, though, so the handshake of client vs. client goes on indefinitely. I'm fixing the server side and once that is done will clean up SSL at the end of the handshake and launch a few thousand clients. It's not a good repro case yet although I would love to know why withTimeOut is throwing that exception. Joel On Nov 18, 2005, at 5:02 PM, Christian Maeder wrote: > Sorry, I can only show you my output on > Linux turing 2.6.11.4-21.9, but I don't know what's going on and > will not have more time this week. > > Cheers Christian > > maeder@turing:/local/maeder/haskell/postmortem> ./server > 17:55:14: ThreadId 1: Accepted new connection: {handle: <socket: 4>} > 17:55:14: ThreadId 1: Verify locations: 1 > 17:55:14: ThreadId 1: sslGetError: 2 > 17:55:14: ThreadId 4: Starting SSL handshake... > 17:55:14: ThreadId 4: Reading from BIO... > 17:55:14: ThreadId 4: Waiting for BIO 0x080d10d8 > 17:55:14: ThreadId 4: waitForBio: gotta wait a bit... > server: unknown exception -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Marlow
I'm happy to report that the problem can be reproduced by running the
code from my darcs repo at http://test.wagerlabs.com/postmortem. See the README file. I'm on Mac OSX 10.4.3. The server just sits there, goes through the SSL handshake and... does nothing else. The clients go through the handshake with the server and do nothing else. The handshake goes through X number of times and then the client crashes. On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote: > How we normally proceed for a crash like this is as follows: examine > where the crash happened and determine whether it is a result of > heap or > stack corruption, and then attempt to trace backwards to find out > where > the corruption originated from. Tracing backwards means running the > program from the beginning again, so it's essential to have a > reproducible example. Without reproducibility, we have to use a > combination of debugging printfs and staring really hard at the code, > which is much more time consuming (and still requires being able to > run > the program to make it crash with debugging output turned on). -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
If it's MacOS specific, we're not going to be much help at GHC HQ,
because we don't have any (Macs that is). Wolfgang Thaller is the MacOS expert, but maybe there are others now? Simon | -----Original Message----- | From: Joel Reymont [mailto:[hidden email]] | Sent: 19 November 2005 00:57 | To: Simon Marlow | Cc: Simon Peyton-Jones; Haskell Cafe | Subject: Re: [Haskell-cafe] Project postmortem | | I'm happy to report that the problem can be reproduced by running the | code from my darcs repo at http://test.wagerlabs.com/postmortem. See | the README file. I'm on Mac OSX 10.4.3. | | The server just sits there, goes through the SSL handshake and... | does nothing else. The clients go through the handshake with the | server and do nothing else. The handshake goes through X number of | times and then the client crashes. | | On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote: | | > How we normally proceed for a crash like this is as follows: examine | > where the crash happened and determine whether it is a result of | > heap or | > stack corruption, and then attempt to trace backwards to find out | > where | > the corruption originated from. Tracing backwards means running the | > program from the beginning again, so it's essential to have a | > reproducible example. Without reproducibility, we have to use a | > combination of debugging printfs and staring really hard at the code, | > which is much more time consuming (and still requires being able to | > run | > the program to make it crash with debugging output turned on). _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Is Wolfgang still around?
Would you guys be willing to guide me through this? I could then possibly become the next Mac OSX expert :-). I have the disassembler dumps, etc. I do not know how to approach this problem. I read up a bit on the GHC internals, STG, code generation, etc. Thanks, Joel P.S. Please feel free to take the email exchange offline, could be too boring for everyone else On Nov 21, 2005, at 9:35 AM, Simon Peyton-Jones wrote: > If it's MacOS specific, we're not going to be much help at GHC HQ, > because we don't have any (Macs that is). Wolfgang Thaller is the > MacOS > expert, but maybe there are others now? -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Peyton-Jones
Simon,
What about the non-OSX issue of using a Chan to collect traces from thousands of threads? It's not working very well for me when I use readChan in a loop (see the code). getChanContents works much better but then the logger thread is stuck forever and everything else that waits on it is stuck as well. The output from logger (Util.hs) stops after a few lines and thus memory taken starts to grow because all the output sent to the chan is not being processed. Thanks, Joel On Nov 21, 2005, at 9:35 AM, Simon Peyton-Jones wrote: > If it's MacOS specific, we're not going to be much help at GHC HQ, > because we don't have any (Macs that is). Wolfgang Thaller is the > MacOS > expert, but maybe there are others now? -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Powered by Nabble | Edit this page |
