Random crashes with memory corruption symptoms

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Random crashes with memory corruption symptoms

Harendra Kumar
Hi,

While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:

* Segmentation fault: 11
*  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
* internal error: update_fwd: unknown/strange object  223743520

and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.

My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.

My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:

* hspec "parallel" combinator is used to run tests in parallel
* streamly concurrent code is being tested which can create many threads
* The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
* It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1

It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.

My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?

Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .

-harendra

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Matthew Pickering
The way to debug these kinds of issues is to use gdb.

The best clue in your message seems to be "The GHC heap size is
restricted to a small size ~32MB using "-M32M" rts option.".

Good luck!

Matt

On Sun, Feb 2, 2020 at 10:26 PM Harendra Kumar <[hidden email]> wrote:

>
> Hi,
>
> While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>
> * Segmentation fault: 11
> *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
> * internal error: update_fwd: unknown/strange object  223743520
>
> and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>
> My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>
> My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>
> * hspec "parallel" combinator is used to run tests in parallel
> * streamly concurrent code is being tested which can create many threads
> * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
> * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>
> It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>
> My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>
> Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Ömer Sinan Ağacan
In reply to this post by Harendra Kumar
You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).

When debugging it's a good idea to use the latest minor release of your GHC
version (8.8.2 in your case), as minor releases fix bugs and usually do not
introduce new ones as they don't ship new features.

If the problem still exists, unless you're interested in GHC hacking I think
most productive use of the time would be to make the reproduer smaller, and
collect as many data as possible, like which flags trigger/hide the bug.

Some of the things you could check:

- Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
  builds.
- Build your program with `-debug` and run it, see if it crahes.
- Build your program with `-debug` and run it with `+RTS -DS` and see if the
  error message changes.

But really you should try with 8.8.2 as first thing. It's possible that this is
another manifestation of #17088.

Ömer

Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 01:26
tarihinde şunu yazdı:

>
> Hi,
>
> While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>
> * Segmentation fault: 11
> *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
> * internal error: update_fwd: unknown/strange object  223743520
>
> and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>
> My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>
> My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>
> * hspec "parallel" combinator is used to run tests in parallel
> * streamly concurrent code is being tested which can create many threads
> * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
> * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>
> It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>
> My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>
> Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Harendra Kumar
Unfortunately, it is present in 8.8.2 as well.

On Mon, 3 Feb 2020 at 11:22, Ömer Sinan Ağacan <[hidden email]> wrote:
You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).

When debugging it's a good idea to use the latest minor release of your GHC
version (8.8.2 in your case), as minor releases fix bugs and usually do not
introduce new ones as they don't ship new features.

If the problem still exists, unless you're interested in GHC hacking I think
most productive use of the time would be to make the reproduer smaller, and
collect as many data as possible, like which flags trigger/hide the bug.

Some of the things you could check:

- Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
  builds.
- Build your program with `-debug` and run it, see if it crahes.
- Build your program with `-debug` and run it with `+RTS -DS` and see if the
  error message changes.

But really you should try with 8.8.2 as first thing. It's possible that this is
another manifestation of #17088.

Ömer

Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 01:26
tarihinde şunu yazdı:
>
> Hi,
>
> While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>
> * Segmentation fault: 11
> *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
> * internal error: update_fwd: unknown/strange object  223743520
>
> and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>
> My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>
> My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>
> * hspec "parallel" combinator is used to run tests in parallel
> * streamly concurrent code is being tested which can create many threads
> * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
> * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>
> It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>
> My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>
> Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Ömer Sinan Ağacan
In that case it'd be good to move the discussion to Gitlab. Could you file an
issue?

I was able to reproduce on GHC HEAD. With debug runtime I consistently get this
assertion error:

    internal error: ASSERTION FAILED: file rts/Messages.c, line 95

        (GHC version 8.11.0.20200201 for x86_64_unknown_linux)
        Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

In non-debug runtime it works fine maybe half of the time, in others I get a
panic in the GC.

Ömer

Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 10:01
tarihinde şunu yazdı:

>
> Unfortunately, it is present in 8.8.2 as well.
>
> On Mon, 3 Feb 2020 at 11:22, Ömer Sinan Ağacan <[hidden email]> wrote:
>>
>> You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).
>>
>> When debugging it's a good idea to use the latest minor release of your GHC
>> version (8.8.2 in your case), as minor releases fix bugs and usually do not
>> introduce new ones as they don't ship new features.
>>
>> If the problem still exists, unless you're interested in GHC hacking I think
>> most productive use of the time would be to make the reproduer smaller, and
>> collect as many data as possible, like which flags trigger/hide the bug.
>>
>> Some of the things you could check:
>>
>> - Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
>>   builds.
>> - Build your program with `-debug` and run it, see if it crahes.
>> - Build your program with `-debug` and run it with `+RTS -DS` and see if the
>>   error message changes.
>>
>> But really you should try with 8.8.2 as first thing. It's possible that this is
>> another manifestation of #17088.
>>
>> Ömer
>>
>> Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 01:26
>> tarihinde şunu yazdı:
>> >
>> > Hi,
>> >
>> > While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>> >
>> > * Segmentation fault: 11
>> > *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
>> > * internal error: update_fwd: unknown/strange object  223743520
>> >
>> > and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>> >
>> > My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>> >
>> > My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>> >
>> > * hspec "parallel" combinator is used to run tests in parallel
>> > * streamly concurrent code is being tested which can create many threads
>> > * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
>> > * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>> >
>> > It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>> >
>> > My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>> >
>> > Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>> >
>> > -harendra
>> > _______________________________________________
>> > ghc-devs mailing list
>> > [hidden email]
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Harendra Kumar
Ok, I will file an issue. I just wanted to rule out any application level issues first. Did you try it on the code I sent or did you use some other test case? Is there a test-suite in GHC that stresses the threaded runtime?

-harendra

On Mon, 3 Feb 2020 at 14:09, Ömer Sinan Ağacan <[hidden email]> wrote:
In that case it'd be good to move the discussion to Gitlab. Could you file an
issue?

I was able to reproduce on GHC HEAD. With debug runtime I consistently get this
assertion error:

    internal error: ASSERTION FAILED: file rts/Messages.c, line 95

        (GHC version 8.11.0.20200201 for x86_64_unknown_linux)
        Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

In non-debug runtime it works fine maybe half of the time, in others I get a
panic in the GC.

Ömer

Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 10:01
tarihinde şunu yazdı:
>
> Unfortunately, it is present in 8.8.2 as well.
>
> On Mon, 3 Feb 2020 at 11:22, Ömer Sinan Ağacan <[hidden email]> wrote:
>>
>> You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).
>>
>> When debugging it's a good idea to use the latest minor release of your GHC
>> version (8.8.2 in your case), as minor releases fix bugs and usually do not
>> introduce new ones as they don't ship new features.
>>
>> If the problem still exists, unless you're interested in GHC hacking I think
>> most productive use of the time would be to make the reproduer smaller, and
>> collect as many data as possible, like which flags trigger/hide the bug.
>>
>> Some of the things you could check:
>>
>> - Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
>>   builds.
>> - Build your program with `-debug` and run it, see if it crahes.
>> - Build your program with `-debug` and run it with `+RTS -DS` and see if the
>>   error message changes.
>>
>> But really you should try with 8.8.2 as first thing. It's possible that this is
>> another manifestation of #17088.
>>
>> Ömer
>>
>> Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 01:26
>> tarihinde şunu yazdı:
>> >
>> > Hi,
>> >
>> > While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>> >
>> > * Segmentation fault: 11
>> > *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
>> > * internal error: update_fwd: unknown/strange object  223743520
>> >
>> > and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>> >
>> > My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>> >
>> > My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>> >
>> > * hspec "parallel" combinator is used to run tests in parallel
>> > * streamly concurrent code is being tested which can create many threads
>> > * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
>> > * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>> >
>> > It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>> >
>> > My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>> >
>> > Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>> >
>> > -harendra
>> > _______________________________________________
>> > ghc-devs mailing list
>> > [hidden email]
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Random crashes with memory corruption symptoms

Ömer Sinan Ağacan
> Did you try it on the code I sent or did you use some other test case?

I tried your code.

It's still possible that this is an application bug, of course (maybe in one of
the dependencies, if not in your application).

> Is there a test-suite in GHC that stresses the threaded runtime?

Some of the tests in the test suite uses threaded runtime. Other than those not
really.

Ömer

Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 12:29
tarihinde şunu yazdı:

>
> Ok, I will file an issue. I just wanted to rule out any application level issues first. Did you try it on the code I sent or did you use some other test case? Is there a test-suite in GHC that stresses the threaded runtime?
>
> -harendra
>
> On Mon, 3 Feb 2020 at 14:09, Ömer Sinan Ağacan <[hidden email]> wrote:
>>
>> In that case it'd be good to move the discussion to Gitlab. Could you file an
>> issue?
>>
>> I was able to reproduce on GHC HEAD. With debug runtime I consistently get this
>> assertion error:
>>
>>     internal error: ASSERTION FAILED: file rts/Messages.c, line 95
>>
>>         (GHC version 8.11.0.20200201 for x86_64_unknown_linux)
>>         Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
>>
>> In non-debug runtime it works fine maybe half of the time, in others I get a
>> panic in the GC.
>>
>> Ömer
>>
>> Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 10:01
>> tarihinde şunu yazdı:
>> >
>> > Unfortunately, it is present in 8.8.2 as well.
>> >
>> > On Mon, 3 Feb 2020 at 11:22, Ömer Sinan Ağacan <[hidden email]> wrote:
>> >>
>> >> You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).
>> >>
>> >> When debugging it's a good idea to use the latest minor release of your GHC
>> >> version (8.8.2 in your case), as minor releases fix bugs and usually do not
>> >> introduce new ones as they don't ship new features.
>> >>
>> >> If the problem still exists, unless you're interested in GHC hacking I think
>> >> most productive use of the time would be to make the reproduer smaller, and
>> >> collect as many data as possible, like which flags trigger/hide the bug.
>> >>
>> >> Some of the things you could check:
>> >>
>> >> - Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
>> >>   builds.
>> >> - Build your program with `-debug` and run it, see if it crahes.
>> >> - Build your program with `-debug` and run it with `+RTS -DS` and see if the
>> >>   error message changes.
>> >>
>> >> But really you should try with 8.8.2 as first thing. It's possible that this is
>> >> another manifestation of #17088.
>> >>
>> >> Ömer
>> >>
>> >> Harendra Kumar <[hidden email]>, 3 Şub 2020 Pzt, 01:26
>> >> tarihinde şunu yazdı:
>> >> >
>> >> > Hi,
>> >> >
>> >> > While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>> >> >
>> >> > * Segmentation fault: 11
>> >> > *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
>> >> > * internal error: update_fwd: unknown/strange object  223743520
>> >> >
>> >> > and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>> >> >
>> >> > My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>> >> >
>> >> > My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>> >> >
>> >> > * hspec "parallel" combinator is used to run tests in parallel
>> >> > * streamly concurrent code is being tested which can create many threads
>> >> > * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
>> >> > * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>> >> >
>> >> > It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>> >> >
>> >> > My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>> >> >
>> >> > Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>> >> >
>> >> > -harendra
>> >> > _______________________________________________
>> >> > ghc-devs mailing list
>> >> > [hidden email]
>> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs