GHC 7.8.3 thread hang

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC 7.8.3 thread hang

Michael Jones
I am trying to debug a lockup problem (and need help with debugging technique), where hang means a thread stops at a known place during evaluation, and other threads continue.

The code near the problem is like:

       ec <- return $ encode command
       l <- return $ BSL.length ec
       ss <- return $ BSC.unpack ec

It does not matter if I use let or return, or if the length is taken after unpack. I used return so I could use this code for tracing, with strictness to try to find the exact statement that is the problem:

       traceEventIO "sendCommand"
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec
       traceEventIO $ "sendCommand: size " ++ (show l)
       ss <- ec `seq` return $ BSC.unpack ec

When this runs, the program executes this many times, but always hangs under a certain condition.

For good evaluations:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded
7f04173ff700: cap 0: sendCommand: size 4
7f04173ff700: cap 0: sendCommand: unpacked
7f04173ff700: cap 0: Sending command of size 4
7f04173ff700: cap 0: Sending command of size "\NUL\EOT"
7f04173ff700: cap 0: sendCommand: sent
7f04173ff700: cap 0: sendCommand: flushed

for bad evaluation:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded

The lockup occurs when length is taken.

The difference between the working and non-working case is as follows:

A wxHaskell callback stuffs some data in a TChan. A thread started at application startup is reading the TChan and calling the code that hangs. If it did not hang, it would send it by TCP to another process/computer.

In the working case the callback pops a dialog, and passes data from one TChan to another TChan.

In the failing case, the data is used to generate strings in a wxHaskell grid, then it is parsed, and a new data is made. The new data is a combination of old and new pieces of sub data. The shape of the date is identical, because I am not making any edits to the rows.

So when data that the callback sends to TChan is unmodified, no hang. But when the data is used to make text, put it in the gui, process it, and generate new data, it hangs.

As a test I modified the code so that the text is not put into the gui. The results are the same. This indicates it has something to do with creating strings and then data from strings and mixing old and new subdata. Strings are created with show. Data is created by pattern matching and generating numbers from strings. I should also point out that in the working case, the size of the resulting string is small, say 3. In the hang case, the resulting string would be long, say 5000-10000.

I assume there are no limits to the size of ByteStrings or fundemental issues with the RTS stack/heap that require special settings.

I am using the following revisions:

GHC 7.8.3
    base ==4.7.*,
    mtl ==2.2.1,
    containers == 0.5.5.1,
    transformers ==0.4.1.0,
    random == 1.0.1.1,
    wx == 0.91.0.0,
    wxcore == 0.91.0.0,
    wxdirect == 0.91.0.0,
    colour == 2.3.3,
    stm == 2.4.2,
    monad-loops == 0.4.2.1,
    time == 1.4.2,
    old-locale == 1.0.0.6,
    fast-logger == 2.2.3,
    network == 2.6.0.2,
    bytestring == 0.10.4.0,
    control-monad-loop == 0.1,
    binary == 0.7.2.2,

I know that nobody can have an answer based on this. But what I am hoping is either there is some known bug, or someone can guide me in narrowing it down. The event log does not have anything unusual in it. Other threads keep running, and I can exit the application normally. The thread does not throw an exception. It just hangs.

When I run the app, I just use +RTS -v

Perhaps there are some other options that might give more info?

— SNIPPET of log —

7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: sendCommand
7fe544cfea40: cap 0: sendCommand: encoded
7fe544cfea40: cap 0: sendCommand: size 3                               WORKS HERE
7fe544cfea40: cap 0: sendCommand: unpacked
7fe544cfea40: cap 0: Sending command of size 3
7fe544cfea40: cap 0: Sending command of size "\NUL\ETX"
7fe544cfea40: cap 0: sendCommand: sent
7fe544cfea40: cap 0: sendCommand: flushed
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0



7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: thread 45 stopped (yielding)
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: sendCommand
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 45 (ThreadRunGHC)
7fe537eff700: cap 0: fetchTelemetryServer: unlock
7fe537eff700: cap 0: fetchTelemetryServer
7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: fetchTelemetryServer: pump seq



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Carter Schonwald
what OS are you on?
what build options did you use?

On Tue, Nov 11, 2014 at 2:11 AM, Michael Jones <[hidden email]> wrote:
I am trying to debug a lockup problem (and need help with debugging technique), where hang means a thread stops at a known place during evaluation, and other threads continue.

The code near the problem is like:

       ec <- return $ encode command
       l <- return $ BSL.length ec
       ss <- return $ BSC.unpack ec

It does not matter if I use let or return, or if the length is taken after unpack. I used return so I could use this code for tracing, with strictness to try to find the exact statement that is the problem:

       traceEventIO "sendCommand"
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec
       traceEventIO $ "sendCommand: size " ++ (show l)
       ss <- ec `seq` return $ BSC.unpack ec

When this runs, the program executes this many times, but always hangs under a certain condition.

For good evaluations:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded
7f04173ff700: cap 0: sendCommand: size 4
7f04173ff700: cap 0: sendCommand: unpacked
7f04173ff700: cap 0: Sending command of size 4
7f04173ff700: cap 0: Sending command of size "\NUL\EOT"
7f04173ff700: cap 0: sendCommand: sent
7f04173ff700: cap 0: sendCommand: flushed

for bad evaluation:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded

The lockup occurs when length is taken.

The difference between the working and non-working case is as follows:

A wxHaskell callback stuffs some data in a TChan. A thread started at application startup is reading the TChan and calling the code that hangs. If it did not hang, it would send it by TCP to another process/computer.

In the working case the callback pops a dialog, and passes data from one TChan to another TChan.

In the failing case, the data is used to generate strings in a wxHaskell grid, then it is parsed, and a new data is made. The new data is a combination of old and new pieces of sub data. The shape of the date is identical, because I am not making any edits to the rows.

So when data that the callback sends to TChan is unmodified, no hang. But when the data is used to make text, put it in the gui, process it, and generate new data, it hangs.

As a test I modified the code so that the text is not put into the gui. The results are the same. This indicates it has something to do with creating strings and then data from strings and mixing old and new subdata. Strings are created with show. Data is created by pattern matching and generating numbers from strings. I should also point out that in the working case, the size of the resulting string is small, say 3. In the hang case, the resulting string would be long, say 5000-10000.

I assume there are no limits to the size of ByteStrings or fundemental issues with the RTS stack/heap that require special settings.

I am using the following revisions:

GHC 7.8.3
    base ==4.7.*,
    mtl ==2.2.1,
    containers == 0.5.5.1,
    transformers ==0.4.1.0,
    random == 1.0.1.1,
    wx == 0.91.0.0,
    wxcore == 0.91.0.0,
    wxdirect == 0.91.0.0,
    colour == 2.3.3,
    stm == 2.4.2,
    monad-loops == 0.4.2.1,
    time == 1.4.2,
    old-locale == 1.0.0.6,
    fast-logger == 2.2.3,
    network == 2.6.0.2,
    bytestring == 0.10.4.0,
    control-monad-loop == 0.1,
    binary == 0.7.2.2,

I know that nobody can have an answer based on this. But what I am hoping is either there is some known bug, or someone can guide me in narrowing it down. The event log does not have anything unusual in it. Other threads keep running, and I can exit the application normally. The thread does not throw an exception. It just hangs.

When I run the app, I just use +RTS -v

Perhaps there are some other options that might give more info?

— SNIPPET of log —

7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: sendCommand
7fe544cfea40: cap 0: sendCommand: encoded
7fe544cfea40: cap 0: sendCommand: size 3                               WORKS HERE
7fe544cfea40: cap 0: sendCommand: unpacked
7fe544cfea40: cap 0: Sending command of size 3
7fe544cfea40: cap 0: Sending command of size "\NUL\ETX"
7fe544cfea40: cap 0: sendCommand: sent
7fe544cfea40: cap 0: sendCommand: flushed
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0



7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: thread 45 stopped (yielding)
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: sendCommand
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 45 (ThreadRunGHC)
7fe537eff700: cap 0: fetchTelemetryServer: unlock
7fe537eff700: cap 0: fetchTelemetryServer
7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: fetchTelemetryServer: pump seq



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Gregory Collins-3
In reply to this post by Michael Jones

On Mon, Nov 10, 2014 at 11:11 PM, Michael Jones <[hidden email]> wrote:
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec

Your encode function probably loops on some inputs. When you call "return $ foo", foo is not forced; that doesn't happen in your example until BSL.length forces its input. If I'm right, changing the first line to "return $! encode command" will move the hang to before the call to traceEventIO.

-- 
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Michael Jones
In reply to this post by Carter Schonwald
OS: Ubuntu 14.X TLS
GHC-Options:   -rtsopts -threaded -debug -eventlog

Behaves the same with

GHC-Options:   -rtsopts -threaded -O2

Mike

On Nov 11, 2014, at 8:01 AM, Carter Schonwald <[hidden email]> wrote:

what OS are you on?
what build options did you use?

On Tue, Nov 11, 2014 at 2:11 AM, Michael Jones <[hidden email]> wrote:
I am trying to debug a lockup problem (and need help with debugging technique), where hang means a thread stops at a known place during evaluation, and other threads continue.

The code near the problem is like:

       ec <- return $ encode command
       l <- return $ BSL.length ec
       ss <- return $ BSC.unpack ec

It does not matter if I use let or return, or if the length is taken after unpack. I used return so I could use this code for tracing, with strictness to try to find the exact statement that is the problem:

       traceEventIO "sendCommand"
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec
       traceEventIO $ "sendCommand: size " ++ (show l)
       ss <- ec `seq` return $ BSC.unpack ec

When this runs, the program executes this many times, but always hangs under a certain condition.

For good evaluations:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded
7f04173ff700: cap 0: sendCommand: size 4
7f04173ff700: cap 0: sendCommand: unpacked
7f04173ff700: cap 0: Sending command of size 4
7f04173ff700: cap 0: Sending command of size "\NUL\EOT"
7f04173ff700: cap 0: sendCommand: sent
7f04173ff700: cap 0: sendCommand: flushed

for bad evaluation:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded

The lockup occurs when length is taken.

The difference between the working and non-working case is as follows:

A wxHaskell callback stuffs some data in a TChan. A thread started at application startup is reading the TChan and calling the code that hangs. If it did not hang, it would send it by TCP to another process/computer.

In the working case the callback pops a dialog, and passes data from one TChan to another TChan.

In the failing case, the data is used to generate strings in a wxHaskell grid, then it is parsed, and a new data is made. The new data is a combination of old and new pieces of sub data. The shape of the date is identical, because I am not making any edits to the rows.

So when data that the callback sends to TChan is unmodified, no hang. But when the data is used to make text, put it in the gui, process it, and generate new data, it hangs.

As a test I modified the code so that the text is not put into the gui. The results are the same. This indicates it has something to do with creating strings and then data from strings and mixing old and new subdata. Strings are created with show. Data is created by pattern matching and generating numbers from strings. I should also point out that in the working case, the size of the resulting string is small, say 3. In the hang case, the resulting string would be long, say 5000-10000.

I assume there are no limits to the size of ByteStrings or fundemental issues with the RTS stack/heap that require special settings.

I am using the following revisions:

GHC 7.8.3
    base ==4.7.*,
    mtl ==2.2.1,
    containers == 0.5.5.1,
    transformers ==0.4.1.0,
    random == 1.0.1.1,
    wx == 0.91.0.0,
    wxcore == 0.91.0.0,
    wxdirect == 0.91.0.0,
    colour == 2.3.3,
    stm == 2.4.2,
    monad-loops == 0.4.2.1,
    time == 1.4.2,
    old-locale == 1.0.0.6,
    fast-logger == 2.2.3,
    network == 2.6.0.2,
    bytestring == 0.10.4.0,
    control-monad-loop == 0.1,
    binary == 0.7.2.2,

I know that nobody can have an answer based on this. But what I am hoping is either there is some known bug, or someone can guide me in narrowing it down. The event log does not have anything unusual in it. Other threads keep running, and I can exit the application normally. The thread does not throw an exception. It just hangs.

When I run the app, I just use +RTS -v

Perhaps there are some other options that might give more info?

— SNIPPET of log —

7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: sendCommand
7fe544cfea40: cap 0: sendCommand: encoded
7fe544cfea40: cap 0: sendCommand: size 3                               WORKS HERE
7fe544cfea40: cap 0: sendCommand: unpacked
7fe544cfea40: cap 0: Sending command of size 3
7fe544cfea40: cap 0: Sending command of size "\NUL\ETX"
7fe544cfea40: cap 0: sendCommand: sent
7fe544cfea40: cap 0: sendCommand: flushed
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0



7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: thread 45 stopped (yielding)
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: sendCommand
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 45 (ThreadRunGHC)
7fe537eff700: cap 0: fetchTelemetryServer: unlock
7fe537eff700: cap 0: fetchTelemetryServer
7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: fetchTelemetryServer: pump seq



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Michael Jones
In reply to this post by Gregory Collins-3
Gregory,

You are correct, it moved the problem.

I’ll study the code looking for loops, but...

one thing to note, there are two uses of the data for IO, and both would force evaluation of length/encode. One puts data in the GUI, one sends data by Ethernet. When only the GUI code is forcing IO, there is no hang. It is the Ethernet IO that causes the hang.

Because the GUI forcing does not hang, it leads me to believe there are no loops in the code.

Below is some description of overall application. -> means order of operation. T == Thread#. I note where the hang is, and what statement removal fixes it.

Mike

————


Main
T1 -> put TVar -> Start T2 -> Start T3

Async Module (TChan1/2/3 input/output wrapper around Eth based server in another process)
(This module takes a function that sends and recv data so that it is independent from communication mechanism)
(There are 2 output channels, one for response to commands in TCHan1, and one for a stream of telemetry)
T2 -> take TChan1 (config) -> 
      Serialize ->                      (Hangs here)
      Send Eth

T2 -> Read Eth -> 
      Deserialize -> 
      put TChan2 (data1)
      put TChan3 (data2)

fetchTelemetryServer (Consumes from Async Module T2 and puts data in sequence for callback)
T3 -> tryTake MVar2 (server lock) -> 
      take TChan3 (data) -> 
      put TSequence (data) -> 
      put MVar2 (server unlock)

showOptionDialogServer (Produces for Async Module T2)
(Changes the config in a dialog and send it to the server to modify telemetry definition)
wxH Menu Callback -> take MVar2 (server lock) ->
                     take MVar1 (gui lock) ->
                     take TVar (config) ->
                     convert config to strings
                     display in dialog
                     pull strings from dialog
                     build modified config 
                     put TVar (config) ->
                     put TChan1 (config) ->                    (Remove and no hang)
                     put MVar1 (gui unlock) -> 
                     put MVar2 (server lock)

performTelemetryServer
(Takes data from telemetry and prepares it for painting)
wxH Timer Callback -> take MVar1 (gui lock) ->
                      take TSequence (data) -> 
                      modify data
                      put Var (data2) -> 
                      repaint ->
                      put MVar1 (gui unlock)

onPaint
(Update the graphs in the GUI)
wx Paint Callback -> tryTake MVar1 (gui lock) -> 
                     get Var (data2) -> 
                     get TVar (config) ->
                     draw on gui -> 
                     put MVar1 (gui unlock)



On Nov 11, 2014, at 9:45 AM, Gregory Collins <[hidden email]> wrote:


On Mon, Nov 10, 2014 at 11:11 PM, Michael Jones <[hidden email]> wrote:
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec

Your encode function probably loops on some inputs. When you call "return $ foo", foo is not forced; that doesn't happen in your example until BSL.length forces its input. If I'm right, changing the first line to "return $! encode command" will move the hang to before the call to traceEventIO.

-- 
Gregory Collins <[hidden email]>


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Brandon Allbery
In reply to this post by Michael Jones
On Tue, Nov 11, 2014 at 2:11 AM, Michael Jones <[hidden email]> wrote:
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)

Is it just me, or is that second message significant?

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Michael Jones
Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.

Perhaps someone knows if this is normal or not.

Mike


On Nov 11, 2014, at 12:24 PM, Brandon Allbery <[hidden email]> wrote:

On Tue, Nov 11, 2014 at 2:11 AM, Michael Jones <[hidden email]> wrote:
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)

Is it just me, or is that second message significant?

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Gregory Collins-3

On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <[hidden email]> wrote:
Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.

No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?

G


--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Michael Jones
Gregory,

The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:

7fddc7bcd700: cap 2: waking up thread 7 on cap 2
7fddc7bcd700: cap 2: thread 4 stopped (yielding)
7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
7fddcaad6740: cap 2: requesting parallel GC
7fddc5ffe700: cap 0: starting GC
7fddc57fd700: cap 1: starting GC
7fdda77fe700: cap 3: starting GC
7fddcaad6740: cap 2: starting GC

I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.

As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.

What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.

So, I dug deeper into the log. A normal execution for sending a command looks like:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: sendCommand: encoded
7f99e6ffd700: cap 0: sendCommand: size 4
7f99e6ffd700: cap 0: sendCommand: unpacked
7f99e6ffd700: cap 0: Sending command of size 4
7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
7f99e6ffd700: cap 0: sendCommand: sent
7f99e6ffd700: cap 0: sendCommand: flushed
7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock

The thread is run, overflows, GC, runs, then blocks on an MVAr.

For a the hang case:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (yielding)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (yielding)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
...
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: thread 45 stopped (yielding)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)

run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole

And that is the last activity for thread 7.

I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know. 

Mike



On Nov 11, 2014, at 4:01 PM, Gregory Collins <[hidden email]> wrote:


On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <[hidden email]> wrote:
Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.

No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?

G


--
Gregory Collins <[hidden email]>


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

John Lato-2
The blocked on black hole message is very suspicious.  It means that thread 7 is blocked waiting for another thread to evaluate a thunk.  But in this case, it's thread 7 that created that thunk and is supposed to be doing the evaluating.  This is some evidence that Gregory's theory is correct and your encode function loops somewhere.

On Wed Nov 12 2014 at 11:25:30 AM Michael Jones <[hidden email]> wrote:
Gregory,

The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:

7fddc7bcd700: cap 2: waking up thread 7 on cap 2
7fddc7bcd700: cap 2: thread 4 stopped (yielding)
7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
7fddcaad6740: cap 2: requesting parallel GC
7fddc5ffe700: cap 0: starting GC
7fddc57fd700: cap 1: starting GC
7fdda77fe700: cap 3: starting GC
7fddcaad6740: cap 2: starting GC

I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.

As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.

What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.

So, I dug deeper into the log. A normal execution for sending a command looks like:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: sendCommand: encoded
7f99e6ffd700: cap 0: sendCommand: size 4
7f99e6ffd700: cap 0: sendCommand: unpacked
7f99e6ffd700: cap 0: Sending command of size 4
7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
7f99e6ffd700: cap 0: sendCommand: sent
7f99e6ffd700: cap 0: sendCommand: flushed
7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock

The thread is run, overflows, GC, runs, then blocks on an MVAr.

For a the hang case:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (yielding)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (yielding)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
...
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: thread 45 stopped (yielding)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)

run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole

And that is the last activity for thread 7.

I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know. 

Mike



On Nov 11, 2014, at 4:01 PM, Gregory Collins <[hidden email]> wrote:


On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <[hidden email]> wrote:
Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.

No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?

G


--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Michael Jones
I am having difficulty imagining how there could be a loop that evaluates a black hole a second time by the same thread. The paper Haskell on a Shared-Memory Processor mentions that the runtime must not do this. My take on this description is that it should never happen. Even if code was recursive, it should not happen. Code might recurse forever and chew up memory until failure, but a thread creating a black hold should not trip up on its own black hole.

So I am not sure what I am looking for. There is no recursion in the encode.

Can you give some examples of what I should be looking for?

Mike


On Nov 11, 2014, at 8:37 PM, John Lato <[hidden email]> wrote:

The blocked on black hole message is very suspicious.  It means that thread 7 is blocked waiting for another thread to evaluate a thunk.  But in this case, it's thread 7 that created that thunk and is supposed to be doing the evaluating.  This is some evidence that Gregory's theory is correct and your encode function loops somewhere.

On Wed Nov 12 2014 at 11:25:30 AM Michael Jones <[hidden email]> wrote:
Gregory,

The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:

7fddc7bcd700: cap 2: waking up thread 7 on cap 2
7fddc7bcd700: cap 2: thread 4 stopped (yielding)
7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
7fddcaad6740: cap 2: requesting parallel GC
7fddc5ffe700: cap 0: starting GC
7fddc57fd700: cap 1: starting GC
7fdda77fe700: cap 3: starting GC
7fddcaad6740: cap 2: starting GC

I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.

As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.

What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.

So, I dug deeper into the log. A normal execution for sending a command looks like:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: sendCommand: encoded
7f99e6ffd700: cap 0: sendCommand: size 4
7f99e6ffd700: cap 0: sendCommand: unpacked
7f99e6ffd700: cap 0: Sending command of size 4
7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
7f99e6ffd700: cap 0: sendCommand: sent
7f99e6ffd700: cap 0: sendCommand: flushed
7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock

The thread is run, overflows, GC, runs, then blocks on an MVAr.

For a the hang case:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (yielding)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (yielding)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
...
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: thread 45 stopped (yielding)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)

run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole

And that is the last activity for thread 7.

I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know. 

Mike



On Nov 11, 2014, at 4:01 PM, Gregory Collins <[hidden email]> wrote:


On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <[hidden email]> wrote:
Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.

No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?

G


--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC 7.8.3 thread hang

Gregory Collins-3

On Wed, Nov 12, 2014 at 10:01 PM, Michael Jones <[hidden email]> wrote:
Can you give some examples of what I should be looking for?

Without seeing the relevant source code I don't see how to help you further.

Re: blackholes, are you (or any code you're calling) using the "par" combinator?

G
--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users