Anyone else failing to validate on 'linker_unload'?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Ryan Newton
That test builds an executable named 'linker_unload' which segfaults for
me.  Valgrind says this:


    ==42800== Invalid read of size 8
    ==42800==    at 0x66945F: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size 120
free'd
    ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
    ==42800==    by 0x66945E: checkUnload (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x657F7A: GarbageCollect (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x651790: scheduleDoGC (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x6518B4: performGC_ (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
    ==42800==    by 0x403BB1: main (in
/home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)

This went the same across a couple different independent checkouts.

  -Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130831/a1a3dd40/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Edward Z. Yang
Yes, this one is failing for me too. Probably related to the
recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039

Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:

> That test builds an executable named 'linker_unload' which segfaults for
> me.  Valgrind says this:
>
>
>     ==42800== Invalid read of size 8
>     ==42800==    at 0x66945F: checkUnload (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x657F7A: GarbageCollect (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x651790: scheduleDoGC (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x6518B4: performGC_ (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x403BB1: main (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size 120
> free'd
>     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
>     ==42800==    by 0x66945E: checkUnload (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x657F7A: GarbageCollect (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x651790: scheduleDoGC (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x6518B4: performGC_ (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>     ==42800==    by 0x403BB1: main (in
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>
> This went the same across a couple different independent checkouts.
>
>   -Ryan



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Edward Z. Yang
However, as far as I can tell, it is not 100% reproduceable.
In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
this test does not fail.

Edward

Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:

> Yes, this one is failing for me too. Probably related to the
> recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
>
> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
> > That test builds an executable named 'linker_unload' which segfaults for
> > me.  Valgrind says this:
> >
> >
> >     ==42800== Invalid read of size 8
> >     ==42800==    at 0x66945F: checkUnload (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x657F7A: GarbageCollect (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x651790: scheduleDoGC (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x6518B4: performGC_ (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x403BB1: main (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size 120
> > free'd
> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
> >     ==42800==    by 0x66945E: checkUnload (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x657F7A: GarbageCollect (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x651790: scheduleDoGC (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x6518B4: performGC_ (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >     ==42800==    by 0x403BB1: main (in
> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >
> > This went the same across a couple different independent checkouts.
> >
> >   -Ryan



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Austin Seipp-4
I have also not seen this test fail on amd64/Linux since Simon
committed it. From the valgrind output, it looks like your machine is
32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
on 64bit Linux, so I'm a little confused.

Can you please try this?

$ cd testsuite/tests/rts
$ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
$ valgrind ./linker_unload

This will link you with a debug copy of the RTS, so Valgrind/GDB can
relate errors back to the relevant source code. Perhaps this will help
shed light on your problem.


On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:

> However, as far as I can tell, it is not 100% reproduceable.
> In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
> this test does not fail.
>
> Edward
>
> Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
>> Yes, this one is failing for me too. Probably related to the
>> recent object unload patch for http://ghc.haskell.org/trac/ghc/ticket/8039
>>
>> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
>> > That test builds an executable named 'linker_unload' which segfaults for
>> > me.  Valgrind says this:
>> >
>> >
>> >     ==42800== Invalid read of size 8
>> >     ==42800==    at 0x66945F: checkUnload (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x651790: scheduleDoGC (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x6518B4: performGC_ (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x403BB1: main (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size 120
>> > free'd
>> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
>> >     ==42800==    by 0x66945E: checkUnload (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x651790: scheduleDoGC (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x6518B4: performGC_ (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >     ==42800==    by 0x403BB1: main (in
>> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >
>> > This went the same across a couple different independent checkouts.
>> >
>> >   -Ryan
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



--
Regards,
Austin - PGP: 4096R/0x91384671



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Ryan Newton
Hi Austin,

Should have said -- this is 64-bit RHEL 6 (my academic departments
standardized configuration).

 $ uname -a
     Linux  2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013
x86_64 x86_64 x86_64 GNU/Linux

Weirdly it seems to have a different behavior when run by "make" and by
hand.  When I run the make command you provided it segfaults with error
code 2:

*cd . && $MAKE -s --no-print-directory linker_unload    </dev/null
>linker_unload.run.stdout 2>linker_unload.run.stderr*
*Wrong exit code (expected 0 , actual 2 )*
*Stdout:*
*Stderr:*
*make[1]: *** [linker_unload] Segmentation fault (core dumped)*
**** unexpected failure for linker_unload(normal)*
*Unexpected results from:*
*TEST="linker_unload"*

But then when I run it by hand with "./linker_unload" or "valgrind
./linker_unload" I get an unknown symbol error with exit code 1:

*==70613==*
*linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'*
*linker_unload: resolveObjs failed*
*==70613==*
*==70613== HEAP SUMMARY:*


   -Ryan


On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:

> I have also not seen this test fail on amd64/Linux since Simon
> committed it. From the valgrind output, it looks like your machine is
> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
> on 64bit Linux, so I'm a little confused.
>
> Can you please try this?
>
> $ cd testsuite/tests/rts
> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
> $ valgrind ./linker_unload
>
> This will link you with a debug copy of the RTS, so Valgrind/GDB can
> relate errors back to the relevant source code. Perhaps this will help
> shed light on your problem.
>
>
> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
> > However, as far as I can tell, it is not 100% reproduceable.
> > In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
> > this test does not fail.
> >
> > Edward
> >
> > Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700 2013:
> >> Yes, this one is failing for me too. Probably related to the
> >> recent object unload patch for
> http://ghc.haskell.org/trac/ghc/ticket/8039
> >>
> >> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
> >> > That test builds an executable named 'linker_unload' which segfaults
> for
> >> > me.  Valgrind says this:
> >> >
> >> >
> >> >     ==42800== Invalid read of size 8
> >> >     ==42800==    at 0x66945F: checkUnload (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x651790: scheduleDoGC (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x6518B4: performGC_ (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x403BB1: main (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size
> 120
> >> > free'd
> >> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
> >> >     ==42800==    by 0x66945E: checkUnload (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x651790: scheduleDoGC (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x6518B4: performGC_ (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >     ==42800==    by 0x403BB1: main (in
> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >
> >> > This went the same across a couple different independent checkouts.
> >> >
> >> >   -Ryan
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> --
> Regards,
> Austin - PGP: 4096R/0x91384671
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130901/3356fd6a/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Edward Z. Yang
Excerpts from Ryan Newton's message of Sun Sep 01 19:54:34 -0700 2013:
> But then when I run it by hand with "./linker_unload" or "valgrind
> ./linker_unload" I get an unknown symbol error with exit code 1:

Well, that's because that's not what make is running:

    ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)

Try removing the -s flag.

Edward



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Austin Seipp-4
In reply to this post by Ryan Newton
Oops, should have said this: if you checkout the Makefile for
testsuite/tests/rts - at the very bottom - you'll see the
linker_unload target. When run, the executable needs some arguments so
it knows what to try and load:

---
./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
---

So you also need to provide the right arguments. Sorry about that!

On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:

> Hi Austin,
>
> Should have said -- this is 64-bit RHEL 6 (my academic departments
> standardized configuration).
>
>  $ uname -a
>      Linux  2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013
> x86_64 x86_64 x86_64 GNU/Linux
>
> Weirdly it seems to have a different behavior when run by "make" and by
> hand.  When I run the make command you provided it segfaults with error code
> 2:
>
> cd . && $MAKE -s --no-print-directory linker_unload    </dev/null
>>linker_unload.run.stdout 2>linker_unload.run.stderr
> Wrong exit code (expected 0 , actual 2 )
> Stdout:
> Stderr:
> make[1]: *** [linker_unload] Segmentation fault (core dumped)
> *** unexpected failure for linker_unload(normal)
> Unexpected results from:
> TEST="linker_unload"
>
> But then when I run it by hand with "./linker_unload" or "valgrind
> ./linker_unload" I get an unknown symbol error with exit code 1:
>
> ==70613==
> linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
> linker_unload: resolveObjs failed
> ==70613==
> ==70613== HEAP SUMMARY:
>
>
>    -Ryan
>
>
> On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>
>> I have also not seen this test fail on amd64/Linux since Simon
>> committed it. From the valgrind output, it looks like your machine is
>> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
>> on 64bit Linux, so I'm a little confused.
>>
>> Can you please try this?
>>
>> $ cd testsuite/tests/rts
>> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
>> $ valgrind ./linker_unload
>>
>> This will link you with a debug copy of the RTS, so Valgrind/GDB can
>> relate errors back to the relevant source code. Perhaps this will help
>> shed light on your problem.
>>
>>
>> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>> > However, as far as I can tell, it is not 100% reproduceable.
>> > In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
>> > this test does not fail.
>> >
>> > Edward
>> >
>> > Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
>> > 2013:
>> >> Yes, this one is failing for me too. Probably related to the
>> >> recent object unload patch for
>> >> http://ghc.haskell.org/trac/ghc/ticket/8039
>> >>
>> >> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700 2013:
>> >> > That test builds an executable named 'linker_unload' which segfaults
>> >> > for
>> >> > me.  Valgrind says this:
>> >> >
>> >> >
>> >> >     ==42800== Invalid read of size 8
>> >> >     ==42800==    at 0x66945F: checkUnload (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x651790: scheduleDoGC (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x6518B4: performGC_ (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x403BB1: main (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size
>> >> > 120
>> >> > free'd
>> >> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
>> >> >     ==42800==    by 0x66945E: checkUnload (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x651790: scheduleDoGC (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x6518B4: performGC_ (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >     ==42800==    by 0x403BB1: main (in
>> >> >
>> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >
>> >> > This went the same across a couple different independent checkouts.
>> >> >
>> >> >   -Ryan
>> >
>> > _______________________________________________
>> > ghc-devs mailing list
>> > ghc-devs at haskell.org
>> > http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>> --
>> Regards,
>> Austin - PGP: 4096R/0x91384671
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>



--
Regards,
Austin - PGP: 4096R/0x91384671



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Ryan Newton
Ah, yes I see.  Well, giving it the proper arguments when running via
valgrind puts me back to an "Invalid read" segfault.  I confirmed that the
linker_unload executable itself is 64 bit:

$ file linker_unload
linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

==72103== Command: ./linker_unload
/home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
/home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
/home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
==72103==
==72103== Invalid read of size 8
==72103==    at 0x479F9F: checkUnload (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x4689DA: GarbageCollect (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x4621F0: scheduleDoGC (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x462314: performGC_ (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x403341: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==  Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
==72103==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==72103==    by 0x479F9E: checkUnload (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x4689DA: GarbageCollect (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x4621F0: scheduleDoGC (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x462314: performGC_ (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==    by 0x403341: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
==72103==




On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp <aseipp at pobox.com> wrote:

> Oops, should have said this: if you checkout the Makefile for
> testsuite/tests/rts - at the very bottom - you'll see the
> linker_unload target. When run, the executable needs some arguments so
> it knows what to try and load:
>
> ---
> ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
> ---
>
> So you also need to provide the right arguments. Sorry about that!
>
> On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:
> > Hi Austin,
> >
> > Should have said -- this is 64-bit RHEL 6 (my academic departments
> > standardized configuration).
> >
> >  $ uname -a
> >      Linux  2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT
> 2013
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > Weirdly it seems to have a different behavior when run by "make" and by
> > hand.  When I run the make command you provided it segfaults with error
> code
> > 2:
> >
> > cd . && $MAKE -s --no-print-directory linker_unload    </dev/null
> >>linker_unload.run.stdout 2>linker_unload.run.stderr
> > Wrong exit code (expected 0 , actual 2 )
> > Stdout:
> > Stderr:
> > make[1]: *** [linker_unload] Segmentation fault (core dumped)
> > *** unexpected failure for linker_unload(normal)
> > Unexpected results from:
> > TEST="linker_unload"
> >
> > But then when I run it by hand with "./linker_unload" or "valgrind
> > ./linker_unload" I get an unknown symbol error with exit code 1:
> >
> > ==70613==
> > linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
> > linker_unload: resolveObjs failed
> > ==70613==
> > ==70613== HEAP SUMMARY:
> >
> >
> >    -Ryan
> >
> >
> > On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
> >>
> >> I have also not seen this test fail on amd64/Linux since Simon
> >> committed it. From the valgrind output, it looks like your machine is
> >> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
> >> on 64bit Linux, so I'm a little confused.
> >>
> >> Can you please try this?
> >>
> >> $ cd testsuite/tests/rts
> >> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
> >> $ valgrind ./linker_unload
> >>
> >> This will link you with a debug copy of the RTS, so Valgrind/GDB can
> >> relate errors back to the relevant source code. Perhaps this will help
> >> shed light on your problem.
> >>
> >>
> >> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
> >> > However, as far as I can tell, it is not 100% reproduceable.
> >> > In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
> >> > this test does not fail.
> >> >
> >> > Edward
> >> >
> >> > Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
> >> > 2013:
> >> >> Yes, this one is failing for me too. Probably related to the
> >> >> recent object unload patch for
> >> >> http://ghc.haskell.org/trac/ghc/ticket/8039
> >> >>
> >> >> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
> 2013:
> >> >> > That test builds an executable named 'linker_unload' which
> segfaults
> >> >> > for
> >> >> > me.  Valgrind says this:
> >> >> >
> >> >> >
> >> >> >     ==42800== Invalid read of size 8
> >> >> >     ==42800==    at 0x66945F: checkUnload (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x651790: scheduleDoGC (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x6518B4: performGC_ (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x403BB1: main (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of size
> >> >> > 120
> >> >> > free'd
> >> >> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
> >> >> >     ==42800==    by 0x66945E: checkUnload (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x651790: scheduleDoGC (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x6518B4: performGC_ (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >     ==42800==    by 0x403BB1: main (in
> >> >> >
> >> >> >
> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
> >> >> >
> >> >> > This went the same across a couple different independent checkouts.
> >> >> >
> >> >> >   -Ryan
> >> >
> >> > _______________________________________________
> >> > ghc-devs mailing list
> >> > ghc-devs at haskell.org
> >> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Austin - PGP: 4096R/0x91384671
> >>
> >> _______________________________________________
> >> ghc-devs mailing list
> >> ghc-devs at haskell.org
> >> http://www.haskell.org/mailman/listinfo/ghc-devs
> >
> >
>
>
>
> --
> Regards,
> Austin - PGP: 4096R/0x91384671
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130902/ae2579f3/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Austin Seipp-4
I (think) I see the problem, but maybe I'm just tired and shooting in the dark.

The only time checkUnload really iteratively calls free is in
CheckUnload.c (I say 'iteratively', because the fact you're
touching/freeing blocks inside already free blocks make me
suspicious.) The relevant code is:

---------------------------------------------------------------------------
  // Look through the unloadable objects, and any object that is still
  // marked as unreferenced can be physically unloaded, because we
  // have no references to it.
  prev = NULL;
  for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) {
      if (oc->referenced == 0) {
          if (prev == NULL) {
              unloaded_objects = oc->next;
          } else {
              prev->next = oc->next;
          }
          IF_DEBUG(linker, debugBelch("Unloading object file %s\n",
                                      oc->fileName));
          freeObjectCode(oc);
      } else {
          IF_DEBUG(linker, debugBelch("Object file still in use: %s\n",
                                      oc->fileName));
      }
  }
---------------------------------------------------------------------------

Note that we iterate over oc->next in order to check every unloadable
object. If the object can be unloaded, we call freeObjectCode:

---------------------------------------------------------------------------
void freeObjectCode (ObjectCode *oc)
{
    ....
    stgFree(oc->fileName);
    stgFree(oc->archiveMemberName);
    stgFree(oc);
}
---------------------------------------------------------------------------

So it would seem we free the object we point to during each traversal.
This is probably bad and could lead to very weird behavior probably.

Ryan, can you do one final thing? When you run that program, be sure
to specify `+RTS -Dl` (must be linked with -debug.) This will enable
all the debug output where the linker is concerned. There will be a
few hundred lines just for initialization (based on my machine.) If my
theory is correct, you'll probably see stuff like 'Unloading object
file ...' right as the invalid read/segfault occurs.


On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton <rrnewton at gmail.com> wrote:

> Ah, yes I see.  Well, giving it the proper arguments when running via
> valgrind puts me back to an "Invalid read" segfault.  I confirmed that the
> linker_unload executable itself is 64 bit:
>
> $ file linker_unload
> linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
>
> ==72103== Command: ./linker_unload
> /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
> /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
> /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
> ==72103==
> ==72103== Invalid read of size 8
> ==72103==    at 0x479F9F: checkUnload (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x4689DA: GarbageCollect (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x4621F0: scheduleDoGC (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x462314: performGC_ (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x403341: main (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==  Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
> ==72103==    at 0x4A063F0: free (vg_replace_malloc.c:446)
> ==72103==    by 0x479F9E: checkUnload (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x4689DA: GarbageCollect (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x4621F0: scheduleDoGC (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x462314: performGC_ (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==    by 0x403341: main (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==
>
>
>
>
> On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>
>> Oops, should have said this: if you checkout the Makefile for
>> testsuite/tests/rts - at the very bottom - you'll see the
>> linker_unload target. When run, the executable needs some arguments so
>> it knows what to try and load:
>>
>> ---
>> ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
>> ---
>>
>> So you also need to provide the right arguments. Sorry about that!
>>
>> On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>> > Hi Austin,
>> >
>> > Should have said -- this is 64-bit RHEL 6 (my academic departments
>> > standardized configuration).
>> >
>> >  $ uname -a
>> >      Linux  2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT
>> > 2013
>> > x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > Weirdly it seems to have a different behavior when run by "make" and by
>> > hand.  When I run the make command you provided it segfaults with error
>> > code
>> > 2:
>> >
>> > cd . && $MAKE -s --no-print-directory linker_unload    </dev/null
>> >>linker_unload.run.stdout 2>linker_unload.run.stderr
>> > Wrong exit code (expected 0 , actual 2 )
>> > Stdout:
>> > Stderr:
>> > make[1]: *** [linker_unload] Segmentation fault (core dumped)
>> > *** unexpected failure for linker_unload(normal)
>> > Unexpected results from:
>> > TEST="linker_unload"
>> >
>> > But then when I run it by hand with "./linker_unload" or "valgrind
>> > ./linker_unload" I get an unknown symbol error with exit code 1:
>> >
>> > ==70613==
>> > linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
>> > linker_unload: resolveObjs failed
>> > ==70613==
>> > ==70613== HEAP SUMMARY:
>> >
>> >
>> >    -Ryan
>> >
>> >
>> > On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
>> >>
>> >> I have also not seen this test fail on amd64/Linux since Simon
>> >> committed it. From the valgrind output, it looks like your machine is
>> >> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
>> >> on 64bit Linux, so I'm a little confused.
>> >>
>> >> Can you please try this?
>> >>
>> >> $ cd testsuite/tests/rts
>> >> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
>> >> $ valgrind ./linker_unload
>> >>
>> >> This will link you with a debug copy of the RTS, so Valgrind/GDB can
>> >> relate errors back to the relevant source code. Perhaps this will help
>> >> shed light on your problem.
>> >>
>> >>
>> >> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>> >> > However, as far as I can tell, it is not 100% reproduceable.
>> >> > In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
>> >> > this test does not fail.
>> >> >
>> >> > Edward
>> >> >
>> >> > Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
>> >> > 2013:
>> >> >> Yes, this one is failing for me too. Probably related to the
>> >> >> recent object unload patch for
>> >> >> http://ghc.haskell.org/trac/ghc/ticket/8039
>> >> >>
>> >> >> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
>> >> >> 2013:
>> >> >> > That test builds an executable named 'linker_unload' which
>> >> >> > segfaults
>> >> >> > for
>> >> >> > me.  Valgrind says this:
>> >> >> >
>> >> >> >
>> >> >> >     ==42800== Invalid read of size 8
>> >> >> >     ==42800==    at 0x66945F: checkUnload (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x651790: scheduleDoGC (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x6518B4: performGC_ (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x403BB1: main (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of
>> >> >> > size
>> >> >> > 120
>> >> >> > free'd
>> >> >> >     ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
>> >> >> >     ==42800==    by 0x66945E: checkUnload (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x657F7A: GarbageCollect (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x651790: scheduleDoGC (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x6518B4: performGC_ (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >     ==42800==    by 0x403BB1: main (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >
>> >> >> > This went the same across a couple different independent
>> >> >> > checkouts.
>> >> >> >
>> >> >> >   -Ryan
>> >> >
>> >> > _______________________________________________
>> >> > ghc-devs mailing list
>> >> > ghc-devs at haskell.org
>> >> > http://www.haskell.org/mailman/listinfo/ghc-devs
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Austin - PGP: 4096R/0x91384671
>> >>
>> >> _______________________________________________
>> >> ghc-devs mailing list
>> >> ghc-devs at haskell.org
>> >> http://www.haskell.org/mailman/listinfo/ghc-devs
>> >
>> >
>>
>>
>>
>> --
>> Regards,
>> Austin - PGP: 4096R/0x91384671
>
>



--
Regards,
Austin - PGP: 4096R/0x91384671



Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Ryan Newton
> Ryan, can you do one final thing? When you run that program, be sure
> to specify `+RTS -Dl` (must be linked with -debug.) This will enable
> all the debug output where the linker is concerned. There will be a
> few hundred lines just for initialization (based on my machine.) If my
> theory is correct, you'll probably see stuff like 'Unloading object
> file ...' right as the invalid read/segfault occurs.


Hi Austin,

I did this, and it produced a 97MB text file of debug output, the tail end
of which was:

*initLinker: idempotent return*
*lookupSymbol: value of stg_gc_unpt_r1 is 0x485570*
*`stg_gc_unpt_r1' resolves to 0x485570Reloc: P = 0x40b510f3   S = 0x485570
  A = 0xfffffffffffffffc*
*relocations for section 3 using symtab 8*
*Rel entry   0 is raw( (nil) 0x800000001  (nil))   lookupSymbol: looking up
base_ControlziApplicative_zdfApplicativeIO3_info*
*initLinker: start*
*initLinker: idempotent return*
*lookupSymbol: value of base_ControlziApplicative_zdfApplicativeIO3_info is
0x40b51058*
*`base_ControlziApplicative_zdfApplicativeIO3_info' resolves to
0x40b51058Reloc: P = 0x40b51100   S = 0x40b51058   A = (nil)*
*resolveObjs: done*
*lookupSymbol: looking up f*
*initLinker: start*
*initLinker: idempotent return*
*lookupSymbol: value of f is 0x440330c0*
*initLinker: start*
*initLinker: idempotent return*
*unloadObj: Test.o*
*Checking whether to unload Test.o*
*Unloading object file Test.o*

And that's when it segfaulted (notusing valgrind).  If it is of any use,
here is the full output, which fortunately compresses down to 4.4MB:


http://www.cs.indiana.edu/~rrnewton/temp/linker_unload_debug_output.txt.bz2

Best,
  -Ryan

P.S. Here is the equivalent output from the same thing being run under
valgrind:

initLinker: idempotent return
lookupSymbol: value of base_ControlziApplicative_zdfApplicativeIO3_info is
0x4c15058
`base_ControlziApplicative_zdfApplicativeIO3_info' resolves to
0x4c15058Reloc: P = 0x4c15100   S = 0x4c15058   A = (nil)
resolveObjs: done
lookupSymbol: looking up f
initLinker: start
initLinker: idempotent return
lookupSymbol: value of f is 0x4c0f0c0
initLinker: start
initLinker: idempotent return
unloadObj: Test.o
Checking whether to unload Test.o
Unloading object file Test.o
==9030== Invalid read of size 8
==9030==    at 0x492502: checkUnload (CheckUnload.c:286)
==9030==    by 0x476580: GarbageCollect (GC.c:666)
==9030==    by 0x46ADCD: scheduleDoGC (Schedule.c:1652)
==9030==    by 0x46B976: performGC_ (Schedule.c:2551)
==9030==    by 0x46B9AE: performMajorGC (Schedule.c:2565)
==9030==    by 0x4043E1: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload2)
==9030==  Address 0x95c4580 is 80 bytes inside a block of size 120 free'd
==9030==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==9030==    by 0x4656D5: stgFree (RtsUtils.c:107)
==9030==    by 0x45DDF4: freeObjectCode (Linker.c:2087)
==9030==    by 0x4924CF: checkUnload (CheckUnload.c:295)
==9030==    by 0x476580: GarbageCollect (GC.c:666)
==9030==    by 0x46ADCD: scheduleDoGC (Schedule.c:1652)
==9030==    by 0x46B976: performGC_ (Schedule.c:2551)
==9030==    by 0x46B9AE: performMajorGC (Schedule.c:2565)
==9030==    by 0x4043E1: main (in
/home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload2)
==9030==
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130903/cbcde0a6/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Anyone else failing to validate on 'linker_unload'?

Simon Marlow-7
In reply to this post by Austin Seipp-4
That's the bug.  Fix coming!

Simon

On 02/09/13 05:46, Austin Seipp wrote:

> I (think) I see the problem, but maybe I'm just tired and shooting in the dark.
>
> The only time checkUnload really iteratively calls free is in
> CheckUnload.c (I say 'iteratively', because the fact you're
> touching/freeing blocks inside already free blocks make me
> suspicious.) The relevant code is:
>
> ---------------------------------------------------------------------------
>    // Look through the unloadable objects, and any object that is still
>    // marked as unreferenced can be physically unloaded, because we
>    // have no references to it.
>    prev = NULL;
>    for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) {
>        if (oc->referenced == 0) {
>            if (prev == NULL) {
>                unloaded_objects = oc->next;
>            } else {
>                prev->next = oc->next;
>            }
>            IF_DEBUG(linker, debugBelch("Unloading object file %s\n",
>                                        oc->fileName));
>            freeObjectCode(oc);
>        } else {
>            IF_DEBUG(linker, debugBelch("Object file still in use: %s\n",
>                                        oc->fileName));
>        }
>    }
> ---------------------------------------------------------------------------
>
> Note that we iterate over oc->next in order to check every unloadable
> object. If the object can be unloaded, we call freeObjectCode:
>
> ---------------------------------------------------------------------------
> void freeObjectCode (ObjectCode *oc)
> {
>      ....
>      stgFree(oc->fileName);
>      stgFree(oc->archiveMemberName);
>      stgFree(oc);
> }
> ---------------------------------------------------------------------------
>
> So it would seem we free the object we point to during each traversal.
> This is probably bad and could lead to very weird behavior probably.
>
> Ryan, can you do one final thing? When you run that program, be sure
> to specify `+RTS -Dl` (must be linked with -debug.) This will enable
> all the debug output where the linker is concerned. There will be a
> few hundred lines just for initialization (based on my machine.) If my
> theory is correct, you'll probably see stuff like 'Unloading object
> file ...' right as the invalid read/segfault occurs.
>
>
> On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>> Ah, yes I see.  Well, giving it the proper arguments when running via
>> valgrind puts me back to an "Invalid read" segfault.  I confirmed that the
>> linker_unload executable itself is 64 bit:
>>
>> $ file linker_unload
>> linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
>>
>> ==72103== Command: ./linker_unload
>> /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
>> /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
>> /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
>> ==72103==
>> ==72103== Invalid read of size 8
>> ==72103==    at 0x479F9F: checkUnload (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x4689DA: GarbageCollect (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x4621F0: scheduleDoGC (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x462314: performGC_ (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x403341: main (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==  Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
>> ==72103==    at 0x4A063F0: free (vg_replace_malloc.c:446)
>> ==72103==    by 0x479F9E: checkUnload (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x4689DA: GarbageCollect (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x4621F0: scheduleDoGC (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x462314: performGC_ (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==    by 0x403341: main (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==
>>
>>
>>
>>
>> On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>
>>> Oops, should have said this: if you checkout the Makefile for
>>> testsuite/tests/rts - at the very bottom - you'll see the
>>> linker_unload target. When run, the executable needs some arguments so
>>> it knows what to try and load:
>>>
>>> ---
>>> ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
>>> ---
>>>
>>> So you also need to provide the right arguments. Sorry about that!
>>>
>>> On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>> Hi Austin,
>>>>
>>>> Should have said -- this is 64-bit RHEL 6 (my academic departments
>>>> standardized configuration).
>>>>
>>>>   $ uname -a
>>>>       Linux  2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT
>>>> 2013
>>>> x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> Weirdly it seems to have a different behavior when run by "make" and by
>>>> hand.  When I run the make command you provided it segfaults with error
>>>> code
>>>> 2:
>>>>
>>>> cd . && $MAKE -s --no-print-directory linker_unload    </dev/null
>>>>> linker_unload.run.stdout 2>linker_unload.run.stderr
>>>> Wrong exit code (expected 0 , actual 2 )
>>>> Stdout:
>>>> Stderr:
>>>> make[1]: *** [linker_unload] Segmentation fault (core dumped)
>>>> *** unexpected failure for linker_unload(normal)
>>>> Unexpected results from:
>>>> TEST="linker_unload"
>>>>
>>>> But then when I run it by hand with "./linker_unload" or "valgrind
>>>> ./linker_unload" I get an unknown symbol error with exit code 1:
>>>>
>>>> ==70613==
>>>> linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
>>>> linker_unload: resolveObjs failed
>>>> ==70613==
>>>> ==70613== HEAP SUMMARY:
>>>>
>>>>
>>>>     -Ryan
>>>>
>>>>
>>>> On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>>>
>>>>> I have also not seen this test fail on amd64/Linux since Simon
>>>>> committed it. From the valgrind output, it looks like your machine is
>>>>> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
>>>>> on 64bit Linux, so I'm a little confused.
>>>>>
>>>>> Can you please try this?
>>>>>
>>>>> $ cd testsuite/tests/rts
>>>>> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
>>>>> $ valgrind ./linker_unload
>>>>>
>>>>> This will link you with a debug copy of the RTS, so Valgrind/GDB can
>>>>> relate errors back to the relevant source code. Perhaps this will help
>>>>> shed light on your problem.
>>>>>
>>>>>
>>>>> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>>>>>> However, as far as I can tell, it is not 100% reproduceable.
>>>>>> In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
>>>>>> this test does not fail.
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
>>>>>> 2013:
>>>>>>> Yes, this one is failing for me too. Probably related to the
>>>>>>> recent object unload patch for
>>>>>>> http://ghc.haskell.org/trac/ghc/ticket/8039
>>>>>>>
>>>>>>> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
>>>>>>> 2013:
>>>>>>>> That test builds an executable named 'linker_unload' which
>>>>>>>> segfaults
>>>>>>>> for
>>>>>>>> me.  Valgrind says this:
>>>>>>>>
>>>>>>>>
>>>>>>>>      ==42800== Invalid read of size 8
>>>>>>>>      ==42800==    at 0x66945F: checkUnload (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x657F7A: GarbageCollect (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x651790: scheduleDoGC (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x6518B4: performGC_ (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x403BB1: main (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==  Address 0x5bfdd20 is 80 bytes inside a block of
>>>>>>>> size
>>>>>>>> 120
>>>>>>>> free'd
>>>>>>>>      ==42800==    at 0x4C273F0: free (vg_replace_malloc.c:446)
>>>>>>>>      ==42800==    by 0x66945E: checkUnload (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x657F7A: GarbageCollect (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x651790: scheduleDoGC (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x6518B4: performGC_ (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>      ==42800==    by 0x403BB1: main (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>
>>>>>>>> This went the same across a couple different independent
>>>>>>>> checkouts.
>>>>>>>>
>>>>>>>>    -Ryan
>>>>>>
>>>>>> _______________________________________________
>>>>>> ghc-devs mailing list
>>>>>> ghc-devs at haskell.org
>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Austin - PGP: 4096R/0x91384671
>>>>>
>>>>> _______________________________________________
>>>>> ghc-devs mailing list
>>>>> ghc-devs at haskell.org
>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Austin - PGP: 4096R/0x91384671
>>
>>
>
>
>