Bad news: apparent bug in casMutVar going back to 7.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Ryan Newton
I commented on the commit here:


https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b

The problem is that our "cas" routine in SMP.h is similar to the C compiler
intrinsic __sync_val_compare_and_swap, in that it returns the old value.
 But it seems we cannot use a comparison against that old value to
determine whether or not the CAS succeeded.  (I believe the CAS may fail
due to contention, but the old value may happen to look like our old value.)

Unfortunately, this didn't occur to me until it started causing bugs [1]
[2].  Fixing casMutVar# fixes these bugs.  However, the way I'm currently
fixing CAS in the "atomic-primops" package is by using
__sync_bool_compare_and_swap:


https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200

What is the best fix for GHC itself?   Would it be ok for GHC to include a
C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise we need
another big ifdbef'd function like "cas" in SMP.h that has the
architecture-specific inline asm across all architectures.  I can write the
x86 one, but I'm not eager to try the others.

Best,
   -Ryan

[1] https://github.com/iu-parfunc/lvars/issues/70
[2] https://github.com/rrnewton/haskell-lockfree/issues/15
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/ad2bf67e/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Ryan Newton
Then again... I'm having trouble seeing how the spec on page 3-149 of the
Intel manual would allow the behavior I'm seeing:

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Nevertheless, this is exactly the behavior we're seeing with the current
Haskell primops.  Two threads simultaneously performing the same CAS(p,a,b)
can both think that they succeeded.





On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com> wrote:

> I commented on the commit here:
>
>
> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>
> The problem is that our "cas" routine in SMP.h is similar to the C
> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
> value.  But it seems we cannot use a comparison against that old value to
> determine whether or not the CAS succeeded.  (I believe the CAS may fail
> due to contention, but the old value may happen to look like our old value.)
>
> Unfortunately, this didn't occur to me until it started causing bugs [1]
> [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm currently
> fixing CAS in the "atomic-primops" package is by using
> __sync_bool_compare_and_swap:
>
>
> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>
> What is the best fix for GHC itself?   Would it be ok for GHC to include a
> C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise we need
> another big ifdbef'd function like "cas" in SMP.h that has the
> architecture-specific inline asm across all architectures.  I can write the
> x86 one, but I'm not eager to try the others.
>
> Best,
>    -Ryan
>
> [1] https://github.com/iu-parfunc/lvars/issues/70
> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/931c1730/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
Hey Ryan,
looking at this closely
Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be the
culprit?

Could the issue be that we've not had a good stress test that would create
values that are equal on the 32bit range, but differ on the 64bit range,
and you're hitting that?

Could you try seeing if doing that change fixes things up?
(I may be completely wrong, but just throwing this out as a naive "obvious"
guess)


On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com> wrote:

> Then again... I'm having trouble seeing how the spec on page 3-149 of the
> Intel manual would allow the behavior I'm seeing:
>
>
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>
> Nevertheless, this is exactly the behavior we're seeing with the current
> Haskell primops.  Two threads simultaneously performing the same CAS(p,a,b)
> can both think that they succeeded.
>
>
>
>
>
> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> I commented on the commit here:
>>
>>
>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>
>> The problem is that our "cas" routine in SMP.h is similar to the C
>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>> value.  But it seems we cannot use a comparison against that old value to
>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>> due to contention, but the old value may happen to look like our old value.)
>>
>> Unfortunately, this didn't occur to me until it started causing bugs [1]
>> [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm currently
>> fixing CAS in the "atomic-primops" package is by using
>> __sync_bool_compare_and_swap:
>>
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>
>> What is the best fix for GHC itself?   Would it be ok for GHC to include
>> a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise we need
>> another big ifdbef'd function like "cas" in SMP.h that has the
>> architecture-specific inline asm across all architectures.  I can write the
>> x86 one, but I'm not eager to try the others.
>>
>> Best,
>>    -Ryan
>>
>> [1] https://github.com/iu-parfunc/lvars/issues/70
>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>
>>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/ee620af5/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
rather than cmpxchg
i'll whip up a strawman patch on head that can be cherrypicked / tested out
by ryan et al


On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> Hey Ryan,
> looking at this closely
> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be the
> culprit?
>
> Could the issue be that we've not had a good stress test that would create
> values that are equal on the 32bit range, but differ on the 64bit range,
> and you're hitting that?
>
> Could you try seeing if doing that change fixes things up?
> (I may be completely wrong, but just throwing this out as a naive
> "obvious" guess)
>
>
> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> Then again... I'm having trouble seeing how the spec on page 3-149 of the
>> Intel manual would allow the behavior I'm seeing:
>>
>>
>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>
>> Nevertheless, this is exactly the behavior we're seeing with the current
>> Haskell primops.  Two threads simultaneously performing the same CAS(p,a,b)
>> can both think that they succeeded.
>>
>>
>>
>>
>>
>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> I commented on the commit here:
>>>
>>>
>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>
>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>> value.  But it seems we cannot use a comparison against that old value to
>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>> due to contention, but the old value may happen to look like our old value.)
>>>
>>> Unfortunately, this didn't occur to me until it started causing bugs [1]
>>> [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm currently
>>> fixing CAS in the "atomic-primops" package is by using
>>> __sync_bool_compare_and_swap:
>>>
>>>
>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>
>>> What is the best fix for GHC itself?   Would it be ok for GHC to include
>>> a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise we need
>>> another big ifdbef'd function like "cas" in SMP.h that has the
>>> architecture-specific inline asm across all architectures.  I can write the
>>> x86 one, but I'm not eager to try the others.
>>>
>>> Best,
>>>    -Ryan
>>>
>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>
>>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/2069551d/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
woops, i mean cmpxchgq


On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
> rather than cmpxchg
> i'll whip up a strawman patch on head that can be cherrypicked / tested
> out by ryan et al
>
>
> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Hey Ryan,
>> looking at this closely
>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be the
>> culprit?
>>
>> Could the issue be that we've not had a good stress test that would
>> create values that are equal on the 32bit range, but differ on the 64bit
>> range, and you're hitting that?
>>
>> Could you try seeing if doing that change fixes things up?
>> (I may be completely wrong, but just throwing this out as a naive
>> "obvious" guess)
>>
>>
>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Then again... I'm having trouble seeing how the spec on page 3-149 of
>>> the Intel manual would allow the behavior I'm seeing:
>>>
>>>
>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>
>>> Nevertheless, this is exactly the behavior we're seeing with the current
>>> Haskell primops.  Two threads simultaneously performing the same CAS(p,a,b)
>>> can both think that they succeeded.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>
>>>> I commented on the commit here:
>>>>
>>>>
>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>
>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>> value.  But it seems we cannot use a comparison against that old value to
>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>> due to contention, but the old value may happen to look like our old value.)
>>>>
>>>> Unfortunately, this didn't occur to me until it started causing bugs
>>>> [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>> __sync_bool_compare_and_swap:
>>>>
>>>>
>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>
>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>> architecture-specific inline asm across all architectures.  I can write the
>>>> x86 one, but I'm not eager to try the others.
>>>>
>>>> Best,
>>>>    -Ryan
>>>>
>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/89e9051a/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
i have a ticket for tracking this, though i'm thinking my initial attempt
at a patch generates the same object code as it did before.

@ryan, what CPU variant are you testing this on? is this on a NUMA machine
or something?


On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> woops, i mean cmpxchgq
>
>
> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
>> rather than cmpxchg
>> i'll whip up a strawman patch on head that can be cherrypicked / tested
>> out by ryan et al
>>
>>
>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> Hey Ryan,
>>> looking at this closely
>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be the
>>> culprit?
>>>
>>> Could the issue be that we've not had a good stress test that would
>>> create values that are equal on the 32bit range, but differ on the 64bit
>>> range, and you're hitting that?
>>>
>>> Could you try seeing if doing that change fixes things up?
>>> (I may be completely wrong, but just throwing this out as a naive
>>> "obvious" guess)
>>>
>>>
>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>
>>>> Then again... I'm having trouble seeing how the spec on page 3-149 of
>>>> the Intel manual would allow the behavior I'm seeing:
>>>>
>>>>
>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>
>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>> CAS(p,a,b) can both think that they succeeded.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>
>>>>> I commented on the commit here:
>>>>>
>>>>>
>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>
>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>
>>>>> Unfortunately, this didn't occur to me until it started causing bugs
>>>>> [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>> __sync_bool_compare_and_swap:
>>>>>
>>>>>
>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>
>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>> x86 one, but I'm not eager to try the others.
>>>>>
>>>>> Best,
>>>>>    -Ryan
>>>>>
>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> ghc-devs mailing list
>>>> ghc-devs at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/c6aaa00b/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket

when i'm more awake i'll experiment some more


On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> i have a ticket for tracking this, though i'm thinking my initial attempt
> at a patch generates the same object code as it did before.
>
> @ryan, what CPU variant are you testing this on? is this on a NUMA machine
> or something?
>
>
> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> woops, i mean cmpxchgq
>>
>>
>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
>>> rather than cmpxchg
>>> i'll whip up a strawman patch on head that can be cherrypicked / tested
>>> out by ryan et al
>>>
>>>
>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> Hey Ryan,
>>>> looking at this closely
>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be
>>>> the culprit?
>>>>
>>>> Could the issue be that we've not had a good stress test that would
>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>> range, and you're hitting that?
>>>>
>>>> Could you try seeing if doing that change fixes things up?
>>>> (I may be completely wrong, but just throwing this out as a naive
>>>> "obvious" guess)
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>
>>>>> Then again... I'm having trouble seeing how the spec on page 3-149 of
>>>>> the Intel manual would allow the behavior I'm seeing:
>>>>>
>>>>>
>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>
>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>
>>>>>> I commented on the commit here:
>>>>>>
>>>>>>
>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>
>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>
>>>>>> Unfortunately, this didn't occur to me until it started causing bugs
>>>>>> [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>> __sync_bool_compare_and_swap:
>>>>>>
>>>>>>
>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>
>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>
>>>>>> Best,
>>>>>>    -Ryan
>>>>>>
>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ghc-devs mailing list
>>>>> ghc-devs at haskell.org
>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/edac9fa4/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
Ryan, is your benchmark using CAS on pointers, or immediate words? trying
to get atomic primops to build on my 7.8 build on my mac


On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>
> when i'm more awake i'll experiment some more
>
>
> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> i have a ticket for tracking this, though i'm thinking my initial attempt
>> at a patch generates the same object code as it did before.
>>
>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>> machine or something?
>>
>>
>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> woops, i mean cmpxchgq
>>>
>>>
>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl
>>>> rather than cmpxchg
>>>> i'll whip up a strawman patch on head that can be cherrypicked / tested
>>>> out by ryan et al
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> Hey Ryan,
>>>>> looking at this closely
>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be
>>>>> the culprit?
>>>>>
>>>>> Could the issue be that we've not had a good stress test that would
>>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>>> range, and you're hitting that?
>>>>>
>>>>> Could you try seeing if doing that change fixes things up?
>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>> "obvious" guess)
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>
>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149 of
>>>>>> the Intel manual would allow the behavior I'm seeing:
>>>>>>
>>>>>>
>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>
>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>
>>>>>>> I commented on the commit here:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>
>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>
>>>>>>> Unfortunately, this didn't occur to me until it started causing bugs
>>>>>>> [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>
>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>
>>>>>>> Best,
>>>>>>>    -Ryan
>>>>>>>
>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ghc-devs mailing list
>>>>>> ghc-devs at haskell.org
>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/72626f90/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Ryan Newton
Hi Carter & others,

Carter, yes, this is CAS on pointers and in my next mail I'll try to come
up with some hypotheses as to why we may have (remaining) problems there.

But first, I have been assured that on x86 there is no failure mode in
which doing a comparison on the value read by CAS should not correctly
diagnose success or failure (same as directly reading the Zero Flag) [1].

And yet, there's this discrepancy, where the modified casMutVar that I
linked to does not have the failure.  As for reproducing the failure,
either of the two following tests will currently show problems:

   - Two threads try to casIORef False->True, both succeed
   - 120 threads try to read, increment, CAS until they succeed.  The total
   is often not 120 because multiple threads think the successfully
   incremented, say, 33->34.

Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:


*git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
*cd haskell-lockfree-queue/AtomicPrimops/*

*git checkout 1a1e7e55f6706f9e5754*

*cabal sandbox init*

*cabal install -f-withTH -fforeign ./ ./testing --enable-tests*

*./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
-t n_threads*

You may have to run the last line several times to see the failure.

Best,
  -Ryan

[1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is
there just to avoid the extra comparison.

[2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I usually
use to build it is currently not validating (below error, commit
65d05d7334).  After I debug this gmp problem I'll confirm that the bug
under discussion applies on the 7.8 branch.

    ./sync-all checkout ghc-7.8
    sh validate
...
/usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation R_X86_64_32
against `__gmpz_sub' can not be used when making a shared object; recompile
with -fPIC
libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value




On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> Ryan, is your benchmark using CAS on pointers, or immediate words? trying
> to get atomic primops to build on my 7.8 build on my mac
>
>
> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>
>> when i'm more awake i'll experiment some more
>>
>>
>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> i have a ticket for tracking this, though i'm thinking my initial
>>> attempt at a patch generates the same object code as it did before.
>>>
>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>> machine or something?
>>>
>>>
>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> woops, i mean cmpxchgq
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>> use cmpxchgl rather than cmpxchg
>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>> tested out by ryan et al
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> Hey Ryan,
>>>>>> looking at this closely
>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be
>>>>>> the culprit?
>>>>>>
>>>>>> Could the issue be that we've not had a good stress test that would
>>>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>>>> range, and you're hitting that?
>>>>>>
>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>> "obvious" guess)
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>
>>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149
>>>>>>> of the Intel manual would allow the behavior I'm seeing:
>>>>>>>
>>>>>>>
>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>
>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>
>>>>>>>> I commented on the commit here:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>
>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>>
>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>> bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>
>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>    -Ryan
>>>>>>>>
>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ghc-devs mailing list
>>>>>>> ghc-devs at haskell.org
>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/715694dd/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first have
to get some pull requests in on atomic-primops before I can test it locally
:), expect those patches later today!

looks like gcc's inline ASM logic is pretty correct, after testing it a bit
locally, pardon my speculative jumping the gun.


On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:

> Hi Carter & others,
>
> Carter, yes, this is CAS on pointers and in my next mail I'll try to come
> up with some hypotheses as to why we may have (remaining) problems there.
>
> But first, I have been assured that on x86 there is no failure mode in
> which doing a comparison on the value read by CAS should not correctly
> diagnose success or failure (same as directly reading the Zero Flag) [1].
>
> And yet, there's this discrepancy, where the modified casMutVar that I
> linked to does not have the failure.  As for reproducing the failure,
> either of the two following tests will currently show problems:
>
>    - Two threads try to casIORef False->True, both succeed
>    - 120 threads try to read, increment, CAS until they succeed.  The
>    total is often not 120 because multiple threads think the successfully
>    incremented, say, 33->34.
>
> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>
>
> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
> *cd haskell-lockfree-queue/AtomicPrimops/*
>
> *git checkout 1a1e7e55f6706f9e5754*
>
> *cabal sandbox init*
>
> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>
> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
> -t n_threads*
>
> You may have to run the last line several times to see the failure.
>
> Best,
>   -Ryan
>
> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is
> there just to avoid the extra comparison.
>
> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
> usually use to build it is currently not validating (below error, commit
> 65d05d7334).  After I debug this gmp problem I'll confirm that the bug
> under discussion applies on the 7.8 branch.
>
>     ./sync-all checkout ghc-7.8
>     sh validate
> ...
> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation R_X86_64_32
> against `__gmpz_sub' can not be used when making a shared object; recompile
> with -fPIC
> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value
>
>
>
>
> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Ryan, is your benchmark using CAS on pointers, or immediate words? trying
>> to get atomic primops to build on my 7.8 build on my mac
>>
>>
>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>
>>> when i'm more awake i'll experiment some more
>>>
>>>
>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>> attempt at a patch generates the same object code as it did before.
>>>>
>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>> machine or something?
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> woops, i mean cmpxchgq
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>> use cmpxchgl rather than cmpxchg
>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>> tested out by ryan et al
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> Hey Ryan,
>>>>>>> looking at this closely
>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that be
>>>>>>> the culprit?
>>>>>>>
>>>>>>> Could the issue be that we've not had a good stress test that would
>>>>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>>>>> range, and you're hitting that?
>>>>>>>
>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>>> "obvious" guess)
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>
>>>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149
>>>>>>>> of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>
>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>
>>>>>>>>> I commented on the commit here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>
>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C
>>>>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old
>>>>>>>>> value.  But it seems we cannot use a comparison against that old value to
>>>>>>>>> determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>>>
>>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>>> bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>
>>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>    -Ryan
>>>>>>>>>
>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ghc-devs mailing list
>>>>>>>> ghc-devs at haskell.org
>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/98b74a1c/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Ryan Newton
Ok, here's another experiment, on this commit:


https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc

Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's version
of cas(), the problem also goes away.  I think these two implementations
should behave identically, and that they don't perhaps indicates that there
is something off about the inline asm, as Carter was suggesting:

*#if i386_HOST_ARCH || x86_64_HOST_ARCH*
*    __asm__ __volatile__ (*
*   "lock\ncmpxchg %3,%1"*
*          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
*          :"0" (o), "r" (n));*
*    return o;*

The x86 CAS instruction must put the "return value" in the accumulator
register, and indeed this constrains "o" to be allocated to the accumulator
register, while the new value "n" can be in any register.

So if there's a problem, I don't know what it is.  Except I'm not sure what
the ramifications of "o" being a function parameter AND having an "=a"
constraint on it are...

   -Ryan



On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first
> have to get some pull requests in on atomic-primops before I can test it
> locally :), expect those patches later today!
>
> looks like gcc's inline ASM logic is pretty correct, after testing it a
> bit locally, pardon my speculative jumping the gun.
>
>
> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> Hi Carter & others,
>>
>> Carter, yes, this is CAS on pointers and in my next mail I'll try to come
>> up with some hypotheses as to why we may have (remaining) problems there.
>>
>> But first, I have been assured that on x86 there is no failure mode in
>> which doing a comparison on the value read by CAS should not correctly
>> diagnose success or failure (same as directly reading the Zero Flag) [1].
>>
>> And yet, there's this discrepancy, where the modified casMutVar that I
>> linked to does not have the failure.  As for reproducing the failure,
>> either of the two following tests will currently show problems:
>>
>>    - Two threads try to casIORef False->True, both succeed
>>    - 120 threads try to read, increment, CAS until they succeed.  The
>>    total is often not 120 because multiple threads think the successfully
>>    incremented, say, 33->34.
>>
>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>>
>>
>> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
>> *cd haskell-lockfree-queue/AtomicPrimops/*
>>
>> *git checkout 1a1e7e55f6706f9e5754*
>>
>> *cabal sandbox init*
>>
>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>>
>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>> -t n_threads*
>>
>> You may have to run the last line several times to see the failure.
>>
>> Best,
>>   -Ryan
>>
>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is
>> there just to avoid the extra comparison.
>>
>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
>> usually use to build it is currently not validating (below error, commit
>> 65d05d7334).  After I debug this gmp problem I'll confirm that the bug
>> under discussion applies on the 7.8 branch.
>>
>>     ./sync-all checkout ghc-7.8
>>     sh validate
>> ...
>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation
>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared
>> object; recompile with -fPIC
>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value
>>
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> Ryan, is your benchmark using CAS on pointers, or immediate words?
>>> trying to get atomic primops to build on my 7.8 build on my mac
>>>
>>>
>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>>
>>>> when i'm more awake i'll experiment some more
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>>> attempt at a patch generates the same object code as it did before.
>>>>>
>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>>> machine or something?
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> woops, i mean cmpxchgq
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>>> use cmpxchgl rather than cmpxchg
>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>>> tested out by ryan et al
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hey Ryan,
>>>>>>>> looking at this closely
>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that
>>>>>>>> be the culprit?
>>>>>>>>
>>>>>>>> Could the issue be that we've not had a good stress test that would
>>>>>>>> create values that are equal on the 32bit range, but differ on the 64bit
>>>>>>>> range, and you're hitting that?
>>>>>>>>
>>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>>>> "obvious" guess)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149
>>>>>>>>> of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>>
>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> I commented on the commit here:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>>
>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the
>>>>>>>>>> C compiler intrinsic __sync_val_compare_and_swap, in that it returns the
>>>>>>>>>> old value.  But it seems we cannot use a comparison against that old value
>>>>>>>>>> to determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>>>>
>>>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>>>> bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>>
>>>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>    -Ryan
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ghc-devs mailing list
>>>>>>>>> ghc-devs at haskell.org
>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/57c9523a/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
hrmmmm I have a crazy idea

"Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64.
Else, clear ZF
and load r/m64 into RAX." is what the docs say for the cmpxchng instruction

so RAX is the old values,  (EAX in the  32bit case). And it looks like we
dont' set that explicitly when calling the asm .. CMPXCHG r/m64, r64

 hrmmm



On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com> wrote:

> Ok, here's another experiment, on this commit:
>
>
> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>
> Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's version
> of cas(), the problem also goes away.  I think these two implementations
> should behave identically, and that they don't perhaps indicates that there
> is something off about the inline asm, as Carter was suggesting:
>
> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
> *    __asm__ __volatile__ (*
> *   "lock\ncmpxchg %3,%1"*
> *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
> *          :"0" (o), "r" (n));*
> *    return o;*
>
> The x86 CAS instruction must put the "return value" in the accumulator
> register, and indeed this constrains "o" to be allocated to the accumulator
> register, while the new value "n" can be in any register.
>
> So if there's a problem, I don't know what it is.  Except I'm not sure
> what the ramifications of "o" being a function parameter AND having an "=a"
> constraint on it are...
>
>    -Ryan
>
>
>
> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first
>> have to get some pull requests in on atomic-primops before I can test it
>> locally :), expect those patches later today!
>>
>> looks like gcc's inline ASM logic is pretty correct, after testing it a
>> bit locally, pardon my speculative jumping the gun.
>>
>>
>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Hi Carter & others,
>>>
>>> Carter, yes, this is CAS on pointers and in my next mail I'll try to
>>> come up with some hypotheses as to why we may have (remaining) problems
>>> there.
>>>
>>> But first, I have been assured that on x86 there is no failure mode in
>>> which doing a comparison on the value read by CAS should not correctly
>>> diagnose success or failure (same as directly reading the Zero Flag) [1].
>>>
>>> And yet, there's this discrepancy, where the modified casMutVar that I
>>> linked to does not have the failure.  As for reproducing the failure,
>>> either of the two following tests will currently show problems:
>>>
>>>    - Two threads try to casIORef False->True, both succeed
>>>    - 120 threads try to read, increment, CAS until they succeed.  The
>>>    total is often not 120 because multiple threads think the successfully
>>>    incremented, say, 33->34.
>>>
>>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>>>
>>>
>>> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
>>> *cd haskell-lockfree-queue/AtomicPrimops/*
>>>
>>> *git checkout 1a1e7e55f6706f9e5754*
>>>
>>> *cabal sandbox init*
>>>
>>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>>>
>>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>> -t n_threads*
>>>
>>> You may have to run the last line several times to see the failure.
>>>
>>> Best,
>>>   -Ryan
>>>
>>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is
>>> there just to avoid the extra comparison.
>>>
>>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
>>> usually use to build it is currently not validating (below error, commit
>>> 65d05d7334).  After I debug this gmp problem I'll confirm that the bug
>>> under discussion applies on the 7.8 branch.
>>>
>>>     ./sync-all checkout ghc-7.8
>>>     sh validate
>>> ...
>>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation
>>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared
>>> object; recompile with -fPIC
>>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value
>>>
>>>
>>>
>>>
>>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> Ryan, is your benchmark using CAS on pointers, or immediate words?
>>>> trying to get atomic primops to build on my 7.8 build on my mac
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>>>
>>>>> when i'm more awake i'll experiment some more
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>>>> attempt at a patch generates the same object code as it did before.
>>>>>>
>>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>>>> machine or something?
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> woops, i mean cmpxchgq
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>
>>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>>>> use cmpxchgl rather than cmpxchg
>>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>>>> tested out by ryan et al
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hey Ryan,
>>>>>>>>> looking at this closely
>>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that
>>>>>>>>> be the culprit?
>>>>>>>>>
>>>>>>>>> Could the issue be that we've not had a good stress test that
>>>>>>>>> would create values that are equal on the 32bit range, but differ on the
>>>>>>>>> 64bit range, and you're hitting that?
>>>>>>>>>
>>>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>>>>> "obvious" guess)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Then again... I'm having trouble seeing how the spec on page
>>>>>>>>>> 3-149 of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>>>
>>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> I commented on the commit here:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>>>
>>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the
>>>>>>>>>>> C compiler intrinsic __sync_val_compare_and_swap, in that it returns the
>>>>>>>>>>> old value.  But it seems we cannot use a comparison against that old value
>>>>>>>>>>> to determine whether or not the CAS succeeded.  (I believe the CAS may fail
>>>>>>>>>>> due to contention, but the old value may happen to look like our old value.)
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>>>>> bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>>>
>>>>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC to
>>>>>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise
>>>>>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the
>>>>>>>>>>> architecture-specific inline asm across all architectures.  I can write the
>>>>>>>>>>> x86 one, but I'm not eager to try the others.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>    -Ryan
>>>>>>>>>>>
>>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ghc-devs mailing list
>>>>>>>>>> ghc-devs at haskell.org
>>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/b908e973/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
I got the test suite running on my (2 core) machine mac book air, with 7.8
i've run it several times, not seeing any failures


On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> hrmmmm I have a crazy idea
>
> "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64.
> Else, clear ZF
> and load r/m64 into RAX." is what the docs say for the cmpxchng
> instruction
>
> so RAX is the old values,  (EAX in the  32bit case). And it looks like we
> dont' set that explicitly when calling the asm .. CMPXCHG r/m64, r64
>
>  hrmmm
>
>
>
> On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> Ok, here's another experiment, on this commit:
>>
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>> Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's version
>> of cas(), the problem also goes away.  I think these two implementations
>> should behave identically, and that they don't perhaps indicates that there
>> is something off about the inline asm, as Carter was suggesting:
>>
>> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>> *    __asm__ __volatile__ (*
>> *   "lock\ncmpxchg %3,%1"*
>> *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>> *          :"0" (o), "r" (n));*
>> *    return o;*
>>
>> The x86 CAS instruction must put the "return value" in the accumulator
>> register, and indeed this constrains "o" to be allocated to the accumulator
>> register, while the new value "n" can be in any register.
>>
>> So if there's a problem, I don't know what it is.  Except I'm not sure
>> what the ramifications of "o" being a function parameter AND having an "=a"
>> constraint on it are...
>>
>>    -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first
>>> have to get some pull requests in on atomic-primops before I can test it
>>> locally :), expect those patches later today!
>>>
>>> looks like gcc's inline ASM logic is pretty correct, after testing it a
>>> bit locally, pardon my speculative jumping the gun.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>
>>>> Hi Carter & others,
>>>>
>>>> Carter, yes, this is CAS on pointers and in my next mail I'll try to
>>>> come up with some hypotheses as to why we may have (remaining) problems
>>>> there.
>>>>
>>>> But first, I have been assured that on x86 there is no failure mode in
>>>> which doing a comparison on the value read by CAS should not correctly
>>>> diagnose success or failure (same as directly reading the Zero Flag) [1].
>>>>
>>>> And yet, there's this discrepancy, where the modified casMutVar that I
>>>> linked to does not have the failure.  As for reproducing the failure,
>>>> either of the two following tests will currently show problems:
>>>>
>>>>    - Two threads try to casIORef False->True, both succeed
>>>>    - 120 threads try to read, increment, CAS until they succeed.  The
>>>>    total is often not 120 because multiple threads think the successfully
>>>>    incremented, say, 33->34.
>>>>
>>>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>>>>
>>>>
>>>> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
>>>> *cd haskell-lockfree-queue/AtomicPrimops/*
>>>>
>>>> *git checkout 1a1e7e55f6706f9e5754*
>>>>
>>>> *cabal sandbox init*
>>>>
>>>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>>>>
>>>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>>> -t n_threads*
>>>>
>>>> You may have to run the last line several times to see the failure.
>>>>
>>>> Best,
>>>>   -Ryan
>>>>
>>>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF
>>>> is there just to avoid the extra comparison.
>>>>
>>>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
>>>> usually use to build it is currently not validating (below error, commit
>>>> 65d05d7334).  After I debug this gmp problem I'll confirm that the bug
>>>> under discussion applies on the 7.8 branch.
>>>>
>>>>     ./sync-all checkout ghc-7.8
>>>>     sh validate
>>>> ...
>>>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation
>>>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared
>>>> object; recompile with -fPIC
>>>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
>>>> carter.schonwald at gmail.com> wrote:
>>>>
>>>>> Ryan, is your benchmark using CAS on pointers, or immediate words?
>>>>> trying to get atomic primops to build on my 7.8 build on my mac
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>>>>
>>>>>> when i'm more awake i'll experiment some more
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>>>>> attempt at a patch generates the same object code as it did before.
>>>>>>>
>>>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>>>>> machine or something?
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>
>>>>>>>> woops, i mean cmpxchgq
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>>>>> use cmpxchgl rather than cmpxchg
>>>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>>>>> tested out by ryan et al
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Ryan,
>>>>>>>>>> looking at this closely
>>>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could that
>>>>>>>>>> be the culprit?
>>>>>>>>>>
>>>>>>>>>> Could the issue be that we've not had a good stress test that
>>>>>>>>>> would create values that are equal on the 32bit range, but differ on the
>>>>>>>>>> 64bit range, and you're hitting that?
>>>>>>>>>>
>>>>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>>>>> (I may be completely wrong, but just throwing this out as a naive
>>>>>>>>>> "obvious" guess)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Then again... I'm having trouble seeing how the spec on page
>>>>>>>>>>> 3-149 of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>>>>
>>>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the
>>>>>>>>>>> current Haskell primops.  Two threads simultaneously performing the same
>>>>>>>>>>> CAS(p,a,b) can both think that they succeeded.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <rrnewton at gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I commented on the commit here:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to
>>>>>>>>>>>> the C compiler intrinsic __sync_val_compare_and_swap, in that it returns
>>>>>>>>>>>> the old value.  But it seems we cannot use a comparison against that old
>>>>>>>>>>>> value to determine whether or not the CAS succeeded.  (I believe the CAS
>>>>>>>>>>>> may fail due to contention, but the old value may happen to look like our
>>>>>>>>>>>> old value.)
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, this didn't occur to me until it started causing
>>>>>>>>>>>> bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm
>>>>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>>>>
>>>>>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC
>>>>>>>>>>>> to include a C compiler intrinsic like __sync_val_compare_and_swap?
>>>>>>>>>>>>  Otherwise we need another big ifdbef'd function like "cas" in SMP.h that
>>>>>>>>>>>> has the architecture-specific inline asm across all architectures.  I can
>>>>>>>>>>>> write the x86 one, but I'm not eager to try the others.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>    -Ryan
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ghc-devs mailing list
>>>>>>>>>>> ghc-devs at haskell.org
>>>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/f3a0af14/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Ryan Newton
Ok, my bad, sorry all.

This is NOT a problem that will crop up in 7.8.   Rather, it's just a
problem with the duplicated bits of GHC RTS functionality that were stuck
into the atomic-primops library.  It was a C preprocessor problem that was
causing the inline asm we were discussing in this thread to not actually be
called.

Still, I'd like to be reminded of the rational for all this conditional
inline asm rather than using the C compiler intrinsics!  Anyone?

Best,
  -Ryan



On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald <carter.schonwald at gmail.com
> wrote:

> I got the test suite running on my (2 core) machine mac book air, with 7.8
> i've run it several times, not seeing any failures
>
>
> On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> hrmmmm I have a crazy idea
>>
>> "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64.
>> Else, clear ZF
>> and load r/m64 into RAX." is what the docs say for the cmpxchng
>> instruction
>>
>> so RAX is the old values,  (EAX in the  32bit case). And it looks like we
>> dont' set that explicitly when calling the asm .. CMPXCHG r/m64, r64
>>
>>  hrmmm
>>
>>
>>
>> On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Ok, here's another experiment, on this commit:
>>>
>>>
>>> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>>
>>> Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's
>>> version of cas(), the problem also goes away.  I think these two
>>> implementations should behave identically, and that they don't perhaps
>>> indicates that there is something off about the inline asm, as Carter was
>>> suggesting:
>>>
>>> *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>>> *    __asm__ __volatile__ (*
>>> *   "lock\ncmpxchg %3,%1"*
>>> *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>>> *          :"0" (o), "r" (n));*
>>> *    return o;*
>>>
>>> The x86 CAS instruction must put the "return value" in the accumulator
>>> register, and indeed this constrains "o" to be allocated to the accumulator
>>> register, while the new value "n" can be in any register.
>>>
>>> So if there's a problem, I don't know what it is.  Except I'm not sure
>>> what the ramifications of "o" being a function parameter AND having an "=a"
>>> constraint on it are...
>>>
>>>    -Ryan
>>>
>>>
>>>
>>> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first
>>>> have to get some pull requests in on atomic-primops before I can test it
>>>> locally :), expect those patches later today!
>>>>
>>>> looks like gcc's inline ASM logic is pretty correct, after testing it a
>>>> bit locally, pardon my speculative jumping the gun.
>>>>
>>>>
>>>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>>
>>>>> Hi Carter & others,
>>>>>
>>>>> Carter, yes, this is CAS on pointers and in my next mail I'll try to
>>>>> come up with some hypotheses as to why we may have (remaining) problems
>>>>> there.
>>>>>
>>>>> But first, I have been assured that on x86 there is no failure mode in
>>>>> which doing a comparison on the value read by CAS should not correctly
>>>>> diagnose success or failure (same as directly reading the Zero Flag) [1].
>>>>>
>>>>> And yet, there's this discrepancy, where the modified casMutVar that I
>>>>> linked to does not have the failure.  As for reproducing the failure,
>>>>> either of the two following tests will currently show problems:
>>>>>
>>>>>    - Two threads try to casIORef False->True, both succeed
>>>>>    - 120 threads try to read, increment, CAS until they succeed.  The
>>>>>    total is often not 120 because multiple threads think the successfully
>>>>>    incremented, say, 33->34.
>>>>>
>>>>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
>>>>>
>>>>>
>>>>> *git clone git at github.com:rrnewton/haskell-lockfree-queue.git *
>>>>> *cd haskell-lockfree-queue/AtomicPrimops/*
>>>>>
>>>>> *git checkout 1a1e7e55f6706f9e5754*
>>>>>
>>>>> *cabal sandbox init*
>>>>>
>>>>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests*
>>>>>
>>>>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>>>> -t n_threads*
>>>>>
>>>>> You may have to run the last line several times to see the failure.
>>>>>
>>>>> Best,
>>>>>   -Ryan
>>>>>
>>>>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF
>>>>> is there just to avoid the extra comparison.
>>>>>
>>>>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I
>>>>> usually use to build it is currently not validating (below error, commit
>>>>> 65d05d7334).  After I debug this gmp problem I'll confirm that the bug
>>>>> under discussion applies on the 7.8 branch.
>>>>>
>>>>>     ./sync-all checkout ghc-7.8
>>>>>     sh validate
>>>>> ...
>>>>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation
>>>>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared
>>>>> object; recompile with -fPIC
>>>>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad
>>>>> value
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald <
>>>>> carter.schonwald at gmail.com> wrote:
>>>>>
>>>>>> Ryan, is your benchmark using CAS on pointers, or immediate words?
>>>>>> trying to get atomic primops to build on my 7.8 build on my mac
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald <
>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>
>>>>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket
>>>>>>>
>>>>>>> when i'm more awake i'll experiment some more
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald <
>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>
>>>>>>>> i have a ticket for tracking this, though i'm thinking my initial
>>>>>>>> attempt at a patch generates the same object code as it did before.
>>>>>>>>
>>>>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA
>>>>>>>> machine or something?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <
>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> woops, i mean cmpxchgq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald <
>>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc
>>>>>>>>>> use cmpxchgl rather than cmpxchg
>>>>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked /
>>>>>>>>>> tested out by ryan et al
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald <
>>>>>>>>>> carter.schonwald at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Ryan,
>>>>>>>>>>> looking at this closely
>>>>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures?  Could
>>>>>>>>>>> that be the culprit?
>>>>>>>>>>>
>>>>>>>>>>> Could the issue be that we've not had a good stress test that
>>>>>>>>>>> would create values that are equal on the 32bit range, but differ on the
>>>>>>>>>>> 64bit range, and you're hitting that?
>>>>>>>>>>>
>>>>>>>>>>> Could you try seeing if doing that change fixes things up?
>>>>>>>>>>> (I may be completely wrong, but just throwing this out as a
>>>>>>>>>>> naive "obvious" guess)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <rrnewton at gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Then again... I'm having trouble seeing how the spec on page
>>>>>>>>>>>> 3-149 of the Intel manual would allow the behavior I'm seeing:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with
>>>>>>>>>>>> the current Haskell primops.  Two threads simultaneously performing the
>>>>>>>>>>>> same CAS(p,a,b) can both think that they succeeded.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <
>>>>>>>>>>>> rrnewton at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I commented on the commit here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to
>>>>>>>>>>>>> the C compiler intrinsic __sync_val_compare_and_swap, in that it returns
>>>>>>>>>>>>> the old value.  But it seems we cannot use a comparison against that old
>>>>>>>>>>>>> value to determine whether or not the CAS succeeded.  (I believe the CAS
>>>>>>>>>>>>> may fail due to contention, but the old value may happen to look like our
>>>>>>>>>>>>> old value.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, this didn't occur to me until it started
>>>>>>>>>>>>> causing bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the
>>>>>>>>>>>>> way I'm currently fixing CAS in the "atomic-primops" package is by using
>>>>>>>>>>>>> __sync_bool_compare_and_swap:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>>>>>>>>>>>>
>>>>>>>>>>>>> What is the best fix for GHC itself?   Would it be ok for GHC
>>>>>>>>>>>>> to include a C compiler intrinsic like __sync_val_compare_and_swap?
>>>>>>>>>>>>>  Otherwise we need another big ifdbef'd function like "cas" in SMP.h that
>>>>>>>>>>>>> has the architecture-specific inline asm across all architectures.  I can
>>>>>>>>>>>>> write the x86 one, but I'm not eager to try the others.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>    -Ryan
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70
>>>>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ghc-devs mailing list
>>>>>>>>>>>> ghc-devs at haskell.org
>>>>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140201/989d8abf/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Simon Marlow-7
So there's no problem in casMutVar#?

There's probably no good reason not to use the gcc intrinsics.  Either
they didn't exist when I wrote the inline asm, or they had been added to
gcc since I last read the docs.

Cheers,
Simon

On 01/02/2014 19:35, Ryan Newton wrote:

> Ok, my bad, sorry all.
>
> This is NOT a problem that will crop up in 7.8.   Rather, it's just a
> problem with the duplicated bits of GHC RTS functionality that were
> stuck into the atomic-primops library.  It was a C preprocessor problem
> that was causing the inline asm we were discussing in this thread to not
> actually be called.
>
> Still, I'd like to be reminded of the rational for all this conditional
> inline asm rather than using the C compiler intrinsics!  Anyone?
>
> Best,
>    -Ryan
>
>
>
> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
>     I got the test suite running on my (2 core) machine mac book air,
>     with 7.8
>     i've run it several times, not seeing any failures
>
>
>     On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>
>         hrmmmm I have a crazy idea
>
>         "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>         into r/m64. Else, clear ZF
>         and load r/m64 into RAX." is what the docs say for the cmpxchng
>         instruction
>
>         so RAX is the old values,  (EAX in the  32bit case). And it
>         looks like we dont' set that explicitly when calling the asm ..
>         CMPXCHG r/m64, r64
>
>           hrmmm
>
>
>
>
>         On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>         <mailto:rrnewton at gmail.com>> wrote:
>
>             Ok, here's another experiment, on this commit:
>
>             https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>
>             Here, if I use GCC's __sync_val_compare_and_swap instead of
>             GHC's version of cas(), the problem also goes away.  I think
>             these two implementations should behave identically, and
>             that they don't perhaps indicates that there is something
>             off about the inline asm, as Carter was suggesting:
>
>             *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>             *    __asm__ __volatile__ (*
>             * "lock\ncmpxchg %3,%1"*
>             *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>             *          :"0" (o), "r" (n));*
>             *    return o;*
>
>             The x86 CAS instruction must put the "return value" in the
>             accumulator register, and indeed this constrains "o" to be
>             allocated to the accumulator register, while the new value
>             "n" can be in any register.
>
>             So if there's a problem, I don't know what it is.  Except
>             I'm not sure what the ramifications of "o" being a function
>             parameter AND having an "=a" constraint on it are...
>
>                 -Ryan
>
>
>
>             On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>             <carter.schonwald at gmail.com
>             <mailto:carter.schonwald at gmail.com>> wrote:
>
>                 Hey Ryan, i've made the leap to using 7.8 on my machine,
>                 so i'll first have to get some pull requests in on
>                 atomic-primops before I can test it locally :), expect
>                 those patches later today!
>
>                 looks like gcc's inline ASM logic is pretty correct,
>                 after testing it a bit locally, pardon my speculative
>                 jumping the gun.
>
>
>                 On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>                 <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>
>                     Hi Carter & others,
>
>                     Carter, yes, this is CAS on pointers and in my next
>                     mail I'll try to come up with some hypotheses as to
>                     why we may have (remaining) problems there.
>
>                     But first, I have been assured that on x86 there is
>                     no failure mode in which doing a comparison on the
>                     value read by CAS should not correctly diagnose
>                     success or failure (same as directly reading the
>                     Zero Flag) [1].
>
>                     And yet, there's this discrepancy, where the
>                     modified casMutVar that I linked to does not have
>                     the failure.  As for reproducing the failure, either
>                     of the two following tests will currently show problems:
>
>                       * Two threads try to casIORef False->True, both
>                         succeed
>                       * 120 threads try to read, increment, CAS until
>                         they succeed.  The total is often not 120
>                         because multiple threads think the successfully
>                         incremented, say, 33->34.
>
>                     Here's a specific recipe for the latter test on GHC
>                     7.6.3 Mac or Linux:
>
>                     *git clone
>                     git at github.com:rrnewton/haskell-lockfree-queue.git
>                     *
>                     *cd haskell-lockfree-queue/AtomicPrimops/*
>                     *git checkout 1a1e7e55f6706f9e5754
>                     *
>                     *cabal sandbox init
>                     *
>                     *cabal install -f-withTH -fforeign ./ ./testing
>                     --enable-tests
>                     *
>                     *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>                     -t n_threads
>                     *
>
>                     You may have to run the last line several times to
>                     see the failure.
>
>                     Best,
>                        -Ryan
>
>                     [1] I guess the __sync_bool_compare_and_swap
>                     intrinsic which reads ZF is there just to avoid the
>                     extra comparison.
>
>                     [2] P.S. I'd like to try this on GHC head, but the
>                     RHEL 6 machine I usually use to build it is
>                     currently not validating (below error, commit
>                     65d05d7334).  After I debug this gmp problem I'll
>                     confirm that the bug under discussion applies on the
>                     7.8 branch.
>
>                          ./sync-all checkout ghc-7.8
>                          sh validate
>                     ...
>                     /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>                     relocation R_X86_64_32 against `__gmpz_sub' can not
>                     be used when making a shared object; recompile with
>                     -fPIC
>                     libraries/integer-gmp/gmp/objs/aors.o: could not
>                     read symbols: Bad value
>
>
>
>
>                     On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>                     <carter.schonwald at gmail.com
>                     <mailto:carter.schonwald at gmail.com>> wrote:
>
>                         Ryan, is your benchmark using CAS on pointers,
>                         or immediate words? trying to get atomic primops
>                         to build on my 7.8 build on my mac
>
>
>                         On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>                         <carter.schonwald at gmail.com
>                         <mailto:carter.schonwald at gmail.com>> wrote:
>
>                             https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
>                             is the ticket
>
>                             when i'm more awake i'll experiment some more
>
>
>                             On Sat, Feb 1, 2014 at 2:33 AM, Carter
>                             Schonwald <carter.schonwald at gmail.com
>                             <mailto:carter.schonwald at gmail.com>> wrote:
>
>                                 i have a ticket for tracking this,
>                                 though i'm thinking my initial attempt
>                                 at a patch generates the same object
>                                 code as it did before.
>
>                                 @ryan, what CPU variant are you testing
>                                 this on? is this on a NUMA machine or
>                                 something?
>
>
>                                 On Sat, Feb 1, 2014 at 1:58 AM, Carter
>                                 Schonwald <carter.schonwald at gmail.com
>                                 <mailto:carter.schonwald at gmail.com>> wrote:
>
>                                     woops, i mean cmpxchgq
>
>
>                                     On Sat, Feb 1, 2014 at 1:36 AM,
>                                     Carter Schonwald
>                                     <carter.schonwald at gmail.com
>                                     <mailto:carter.schonwald at gmail.com>>
>                                     wrote:
>
>                                         ok, i can confirm that on my
>                                         64bit mac, both clang and gcc
>                                         use cmpxchgl rather than cmpxchg
>                                         i'll whip up a strawman patch on
>                                         head that can be cherrypicked /
>                                         tested out by ryan et al
>
>
>                                         On Sat, Feb 1, 2014 at 1:12 AM,
>                                         Carter Schonwald
>                                         <carter.schonwald at gmail.com
>                                         <mailto:carter.schonwald at gmail.com>>
>                                         wrote:
>
>                                             Hey Ryan,
>                                             looking at this closely
>                                             Why isn't CAS using
>                                             CMPXCHG8B on 64bit
>                                             architectures?  Could that
>                                             be the culprit?
>
>                                             Could the issue be that
>                                             we've not had a good stress
>                                             test that would create
>                                             values that are equal on the
>                                             32bit range, but differ on
>                                             the 64bit range, and you're
>                                             hitting that?
>
>                                             Could you try seeing if
>                                             doing that change fixes
>                                             things up?
>                                             (I may be completely wrong,
>                                             but just throwing this out
>                                             as a naive "obvious" guess)
>
>
>                                             On Sat, Feb 1, 2014 at 12:58
>                                             AM, Ryan Newton
>                                             <rrnewton at gmail.com
>                                             <mailto:rrnewton at gmail.com>>
>                                             wrote:
>
>                                                 Then again... I'm having
>                                                 trouble seeing how the
>                                                 spec on page 3-149 of
>                                                 the Intel manual would
>                                                 allow the behavior I'm
>                                                 seeing:
>
>                                                 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>
>                                                 Nevertheless, this is
>                                                 exactly the behavior
>                                                 we're seeing with the
>                                                 current Haskell primops.
>                                                   Two threads
>                                                 simultaneously
>                                                 performing the same
>                                                 CAS(p,a,b) can both
>                                                 think that they succeeded.
>
>
>
>
>
>                                                 On Sat, Feb 1, 2014 at
>                                                 12:33 AM, Ryan Newton
>                                                 <rrnewton at gmail.com
>                                                 <mailto:rrnewton at gmail.com>>
>                                                 wrote:
>
>                                                     I commented on the
>                                                     commit here:
>
>                                                     https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>
>                                                     The problem is that
>                                                     our "cas" routine in
>                                                     SMP.h is similar to
>                                                     the C compiler
>                                                     intrinsic
>                                                     __sync_val_compare_and_swap,
>                                                     in that it returns
>                                                     the old value.  But
>                                                     it seems we cannot
>                                                     use a comparison
>                                                     against that old
>                                                     value to determine
>                                                     whether or not the
>                                                     CAS succeeded.  (I
>                                                     believe the CAS may
>                                                     fail due to
>                                                     contention, but the
>                                                     old value may happen
>                                                     to look like our old
>                                                     value.)
>
>                                                     Unfortunately, this
>                                                     didn't occur to me
>                                                     until it started
>                                                     causing bugs [1]
>                                                     [2].  Fixing
>                                                     casMutVar# fixes
>                                                     these bugs.
>                                                       However, the way
>                                                     I'm currently fixing
>                                                     CAS in the
>                                                     "atomic-primops"
>                                                     package is by using
>                                                     __sync_bool_compare_and_swap:
>
>                                                     https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>
>                                                     What is the best fix
>                                                     for GHC itself?
>                                                     Would it be ok for
>                                                     GHC to include a C
>                                                     compiler intrinsic
>                                                     like
>                                                     __sync_val_compare_and_swap?
>                                                       Otherwise we need
>                                                     another big ifdbef'd
>                                                     function like "cas"
>                                                     in SMP.h that has
>                                                     the
>                                                     architecture-specific inline
>                                                     asm across all
>                                                     architectures.  I
>                                                     can write the x86
>                                                     one, but I'm not
>                                                     eager to try the others.
>
>                                                     Best,
>                                                         -Ryan
>
>                                                     [1]
>                                                     https://github.com/iu-parfunc/lvars/issues/70
>                                                     [2]
>                                                     https://github.com/rrnewton/haskell-lockfree/issues/15
>
>
>
>                                                 _______________________________________________
>                                                 ghc-devs mailing list
>                                                 ghc-devs at haskell.org
>                                                 <mailto:ghc-devs at haskell.org>
>                                                 http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

pho-2
How about using libatomic_ops? [*1]

GCC atomic intrinsics are only provided by GCC >= 4.2. Clang seems to
have an equivalent set of intrinsics but in different names
[*2]. Basically we should avoid using any non-standard language
extensions where possible.


*1: https://github.com/ivmai/libatomic_ops/
*2: http://clang.llvm.org/docs/LanguageExtensions.html#langext-c11-atomic

Thanks,
PHO


From: Simon Marlow <marlowsd at gmail.com>
Subject: Re: Bad news: apparent bug in casMutVar going back to 7.2
Date: Sat, 01 Feb 2014 21:00:13 +0000

> So there's no problem in casMutVar#?
>
> There's probably no good reason not to use the gcc intrinsics.  Either
> they didn't exist when I wrote the inline asm, or they had been added
> to gcc since I last read the docs.
>
> Cheers,
> Simon
>
> On 01/02/2014 19:35, Ryan Newton wrote:
>> Ok, my bad, sorry all.
>>
>> This is NOT a problem that will crop up in 7.8.   Rather, it's just a
>> problem with the duplicated bits of GHC RTS functionality that were
>> stuck into the atomic-primops library.  It was a C preprocessor
>> problem
>> that was causing the inline asm we were discussing in this thread to
>> not
>> actually be called.
>>
>> Still, I'd like to be reminded of the rational for all this
>> conditional
>> inline asm rather than using the C compiler intrinsics!  Anyone?
>>
>> Best,
>>    -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
>> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>>     I got the test suite running on my (2 core) machine mac book air,
>>     with 7.8
>>     i've run it several times, not seeing any failures
>>
>>
>>     On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>>     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>>     wrote:
>>
>>         hrmmmm I have a crazy idea
>>
>>         "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>>         into r/m64. Else, clear ZF
>>         and load r/m64 into RAX." is what the docs say for the cmpxchng
>>         instruction
>>
>>         so RAX is the old values,  (EAX in the  32bit case). And it
>>         looks like we dont' set that explicitly when calling the asm ..
>>         CMPXCHG r/m64, r64
>>
>>           hrmmm
>>
>>
>>
>>
>>         On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>>         <mailto:rrnewton at gmail.com>> wrote:
>>
>>             Ok, here's another experiment, on this commit:
>>
>>             https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>>             Here, if I use GCC's __sync_val_compare_and_swap instead of
>>             GHC's version of cas(), the problem also goes away.  I think
>>             these two implementations should behave identically, and
>>             that they don't perhaps indicates that there is something
>>             off about the inline asm, as Carter was suggesting:
>>
>>             *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>>             *    __asm__ __volatile__ (*
>>             * "lock\ncmpxchg %3,%1"*
>>             *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>>             *          :"0" (o), "r" (n));*
>>             *    return o;*
>>
>>             The x86 CAS instruction must put the "return value" in the
>>             accumulator register, and indeed this constrains "o" to be
>>             allocated to the accumulator register, while the new value
>>             "n" can be in any register.
>>
>>             So if there's a problem, I don't know what it is.  Except
>>             I'm not sure what the ramifications of "o" being a function
>>             parameter AND having an "=a" constraint on it are...
>>
>>                 -Ryan
>>
>>
>>
>>             On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>>             <carter.schonwald at gmail.com
>>             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                 Hey Ryan, i've made the leap to using 7.8 on my machine,
>>                 so i'll first have to get some pull requests in on
>>                 atomic-primops before I can test it locally :), expect
>>                 those patches later today!
>>
>>                 looks like gcc's inline ASM logic is pretty correct,
>>                 after testing it a bit locally, pardon my speculative
>>                 jumping the gun.
>>
>>
>>                 On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>>                 <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>>
>>                     Hi Carter & others,
>>
>>                     Carter, yes, this is CAS on pointers and in my next
>>                     mail I'll try to come up with some hypotheses as to
>>                     why we may have (remaining) problems there.
>>
>>                     But first, I have been assured that on x86 there is
>>                     no failure mode in which doing a comparison on the
>>                     value read by CAS should not correctly diagnose
>>                     success or failure (same as directly reading the
>>                     Zero Flag) [1].
>>
>>                     And yet, there's this discrepancy, where the
>>                     modified casMutVar that I linked to does not have
>>                     the failure.  As for reproducing the failure, either
>>                     of the two following tests will currently show problems:
>>
>>                       * Two threads try to casIORef False->True, both
>>                         succeed
>>                       * 120 threads try to read, increment, CAS until
>>                         they succeed.  The total is often not 120
>>                         because multiple threads think the successfully
>>                         incremented, say, 33->34.
>>
>>                     Here's a specific recipe for the latter test on GHC
>>                     7.6.3 Mac or Linux:
>>
>>                     *git clone
>>                     git at github.com:rrnewton/haskell-lockfree-queue.git
>>                     *
>>                     *cd haskell-lockfree-queue/AtomicPrimops/*
>>                     *git checkout 1a1e7e55f6706f9e5754
>>                     *
>>                     *cabal sandbox init
>>                     *
>>                     *cabal install -f-withTH -fforeign ./ ./testing
>>                     --enable-tests
>>                     *
>>                     *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops
>>                     -t n_threads
>>                     *
>>
>>                     You may have to run the last line several times to
>>                     see the failure.
>>
>>                     Best,
>>                        -Ryan
>>
>>                     [1] I guess the __sync_bool_compare_and_swap
>>                     intrinsic which reads ZF is there just to avoid the
>>                     extra comparison.
>>
>>                     [2] P.S. I'd like to try this on GHC head, but the
>>                     RHEL 6 machine I usually use to build it is
>>                     currently not validating (below error, commit
>>                     65d05d7334).  After I debug this gmp problem I'll
>>                     confirm that the bug under discussion applies on the
>>                     7.8 branch.
>>
>>                          ./sync-all checkout ghc-7.8
>>                          sh validate
>>                     ...
>>                     /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>>                     relocation R_X86_64_32 against `__gmpz_sub' can not
>>                     be used when making a shared object; recompile with
>>                     -fPIC
>>                     libraries/integer-gmp/gmp/objs/aors.o: could not
>>                     read symbols: Bad value
>>
>>
>>
>>
>>                     On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>>                     <carter.schonwald at gmail.com
>>                     <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                         Ryan, is your benchmark using CAS on pointers,
>>                         or immediate words? trying to get atomic primops
>>                         to build on my 7.8 build on my mac
>>
>>
>>                         On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>>                         <carter.schonwald at gmail.com
>>                         <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                             https://ghc.haskell.org/trac/ghc/ticket/8724#ticket
>>                             is the ticket
>>
>>                             when i'm more awake i'll experiment some more
>>
>>
>>                             On Sat, Feb 1, 2014 at 2:33 AM, Carter
>>                             Schonwald <carter.schonwald at gmail.com
>>                             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                                 i have a ticket for tracking this,
>>                                 though i'm thinking my initial attempt
>>                                 at a patch generates the same object
>>                                 code as it did before.
>>
>>                                 @ryan, what CPU variant are you testing
>>                                 this on? is this on a NUMA machine or
>>                                 something?
>>
>>
>>                                 On Sat, Feb 1, 2014 at 1:58 AM, Carter
>>                                 Schonwald <carter.schonwald at gmail.com
>>                                 <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                                     woops, i mean cmpxchgq
>>
>>
>>                                     On Sat, Feb 1, 2014 at 1:36 AM,
>>                                     Carter Schonwald
>>                                     <carter.schonwald at gmail.com
>>                                     <mailto:carter.schonwald at gmail.com>>
>>                                     wrote:
>>
>>                                         ok, i can confirm that on my
>>                                         64bit mac, both clang and gcc
>>                                         use cmpxchgl rather than cmpxchg
>>                                         i'll whip up a strawman patch on
>>                                         head that can be cherrypicked /
>>                                         tested out by ryan et al
>>
>>
>>                                         On Sat, Feb 1, 2014 at 1:12 AM,
>>                                         Carter Schonwald
>>                                         <carter.schonwald at gmail.com
>>                                         <mailto:carter.schonwald at gmail.com>>
>>                                         wrote:
>>
>>                                             Hey Ryan,
>>                                             looking at this closely
>>                                             Why isn't CAS using
>>                                             CMPXCHG8B on 64bit
>>                                             architectures?  Could that
>>                                             be the culprit?
>>
>>                                             Could the issue be that
>>                                             we've not had a good stress
>>                                             test that would create
>>                                             values that are equal on the
>>                                             32bit range, but differ on
>>                                             the 64bit range, and you're
>>                                             hitting that?
>>
>>                                             Could you try seeing if
>>                                             doing that change fixes
>>                                             things up?
>>                                             (I may be completely wrong,
>>                                             but just throwing this out
>>                                             as a naive "obvious" guess)
>>
>>
>>                                             On Sat, Feb 1, 2014 at 12:58
>>                                             AM, Ryan Newton
>>                                             <rrnewton at gmail.com
>>                                             <mailto:rrnewton at gmail.com>>
>>                                             wrote:
>>
>>                                                 Then again... I'm having
>>                                                 trouble seeing how the
>>                                                 spec on page 3-149 of
>>                                                 the Intel manual would
>>                                                 allow the behavior I'm
>>                                                 seeing:
>>
>>                                                 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>>
>>                                                 Nevertheless, this is
>>                                                 exactly the behavior
>>                                                 we're seeing with the
>>                                                 current Haskell primops.
>>                                                   Two threads
>>                                                 simultaneously
>>                                                 performing the same
>>                                                 CAS(p,a,b) can both
>>                                                 think that they succeeded.
>>
>>
>>
>>
>>
>>                                                 On Sat, Feb 1, 2014 at
>>                                                 12:33 AM, Ryan Newton
>>                                                 <rrnewton at gmail.com
>>                                                 <mailto:rrnewton at gmail.com>>
>>                                                 wrote:
>>
>>                                                     I commented on the
>>                                                     commit here:
>>
>>                                                     https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b
>>
>>                                                     The problem is that
>>                                                     our "cas" routine in
>>                                                     SMP.h is similar to
>>                                                     the C compiler
>>                                                     intrinsic
>>                                                     __sync_val_compare_and_swap,
>>                                                     in that it returns
>>                                                     the old value.  But
>>                                                     it seems we cannot
>>                                                     use a comparison
>>                                                     against that old
>>                                                     value to determine
>>                                                     whether or not the
>>                                                     CAS succeeded.  (I
>>                                                     believe the CAS may
>>                                                     fail due to
>>                                                     contention, but the
>>                                                     old value may happen
>>                                                     to look like our old
>>                                                     value.)
>>
>>                                                     Unfortunately, this
>>                                                     didn't occur to me
>>                                                     until it started
>>                                                     causing bugs [1]
>>                                                     [2].  Fixing
>>                                                     casMutVar# fixes
>>                                                     these bugs.
>>                                                       However, the way
>>                                                     I'm currently fixing
>>                                                     CAS in the
>>                                                     "atomic-primops"
>>                                                     package is by using
>>                                                     __sync_bool_compare_and_swap:
>>
>>                                                     https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200
>>
>>                                                     What is the best fix
>>                                                     for GHC itself?
>>                                                     Would it be ok for
>>                                                     GHC to include a C
>>                                                     compiler intrinsic
>>                                                     like
>>                                                     __sync_val_compare_and_swap?
>>                                                       Otherwise we need
>>                                                     another big ifdbef'd
>>                                                     function like "cas"
>>                                                     in SMP.h that has
>>                                                     the
>>                                                     architecture-specific inline
>>                                                     asm across all
>>                                                     architectures.  I
>>                                                     can write the x86
>>                                                     one, but I'm not
>>                                                     eager to try the others.
>>
>>                                                     Best,
>>                                                         -Ryan
>>
>>                                                     [1]
>>                                                     https://github.com/iu-parfunc/lvars/issues/70
>>                                                     [2]
>>                                                     https://github.com/rrnewton/haskell-lockfree/issues/15
>>
>>
>>
>>                                                 _______________________________________________
>>                                                 ghc-devs mailing list
>>                                                 ghc-devs at haskell.org
>>                                                 <mailto:ghc-devs at haskell.org>
>>                                                 http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140206/90ef3815/attachment.sig>

Reply | Threaded
Open this post in threaded view
|

Bad news: apparent bug in casMutVar going back to 7.2

Carter Schonwald
In reply to this post by Simon Marlow-7
I'll add making that change to my inline primops task. (It does give nice
reference asm for that task. )
On Feb 1, 2014 1:00 PM, "Simon Marlow" <marlowsd at gmail.com> wrote:

> So there's no problem in casMutVar#?
>
> There's probably no good reason not to use the gcc intrinsics.  Either
> they didn't exist when I wrote the inline asm, or they had been added to
> gcc since I last read the docs.
>
> Cheers,
> Simon
>
> On 01/02/2014 19:35, Ryan Newton wrote:
>
>> Ok, my bad, sorry all.
>>
>> This is NOT a problem that will crop up in 7.8.   Rather, it's just a
>> problem with the duplicated bits of GHC RTS functionality that were
>> stuck into the atomic-primops library.  It was a C preprocessor problem
>> that was causing the inline asm we were discussing in this thread to not
>> actually be called.
>>
>> Still, I'd like to be reminded of the rational for all this conditional
>> inline asm rather than using the C compiler intrinsics!  Anyone?
>>
>> Best,
>>    -Ryan
>>
>>
>>
>> On Sat, Feb 1, 2014 at 2:17 PM, Carter Schonwald
>> <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>     I got the test suite running on my (2 core) machine mac book air,
>>     with 7.8
>>     i've run it several times, not seeing any failures
>>
>>
>>     On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald
>>     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>>         hrmmmm I have a crazy idea
>>
>>         "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded
>>         into r/m64. Else, clear ZF
>>         and load r/m64 into RAX." is what the docs say for the cmpxchng
>>         instruction
>>
>>         so RAX is the old values,  (EAX in the  32bit case). And it
>>         looks like we dont' set that explicitly when calling the asm ..
>>         CMPXCHG r/m64, r64
>>
>>           hrmmm
>>
>>
>>
>>
>>         On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <rrnewton at gmail.com
>>         <mailto:rrnewton at gmail.com>> wrote:
>>
>>             Ok, here's another experiment, on this commit:
>>
>>             https://github.com/rrnewton/haskell-lockfree/commit/
>> 399bb19fa02eaf2f2eab5d02c4b608535362f9bc
>>
>>             Here, if I use GCC's __sync_val_compare_and_swap instead of
>>             GHC's version of cas(), the problem also goes away.  I think
>>             these two implementations should behave identically, and
>>             that they don't perhaps indicates that there is something
>>             off about the inline asm, as Carter was suggesting:
>>
>>             *#if i386_HOST_ARCH || x86_64_HOST_ARCH*
>>             *    __asm__ __volatile__ (*
>>             * "lock\ncmpxchg %3,%1"*
>>             *          :"=a"(o), "=m" (*(volatile unsigned int *)p) *
>>             *          :"0" (o), "r" (n));*
>>             *    return o;*
>>
>>             The x86 CAS instruction must put the "return value" in the
>>             accumulator register, and indeed this constrains "o" to be
>>             allocated to the accumulator register, while the new value
>>             "n" can be in any register.
>>
>>             So if there's a problem, I don't know what it is.  Except
>>             I'm not sure what the ramifications of "o" being a function
>>             parameter AND having an "=a" constraint on it are...
>>
>>                 -Ryan
>>
>>
>>
>>             On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald
>>             <carter.schonwald at gmail.com
>>             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                 Hey Ryan, i've made the leap to using 7.8 on my machine,
>>                 so i'll first have to get some pull requests in on
>>                 atomic-primops before I can test it locally :), expect
>>                 those patches later today!
>>
>>                 looks like gcc's inline ASM logic is pretty correct,
>>                 after testing it a bit locally, pardon my speculative
>>                 jumping the gun.
>>
>>
>>                 On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton
>>                 <rrnewton at gmail.com <mailto:rrnewton at gmail.com>> wrote:
>>
>>                     Hi Carter & others,
>>
>>                     Carter, yes, this is CAS on pointers and in my next
>>                     mail I'll try to come up with some hypotheses as to
>>                     why we may have (remaining) problems there.
>>
>>                     But first, I have been assured that on x86 there is
>>                     no failure mode in which doing a comparison on the
>>                     value read by CAS should not correctly diagnose
>>                     success or failure (same as directly reading the
>>                     Zero Flag) [1].
>>
>>                     And yet, there's this discrepancy, where the
>>                     modified casMutVar that I linked to does not have
>>                     the failure.  As for reproducing the failure, either
>>                     of the two following tests will currently show
>> problems:
>>
>>                       * Two threads try to casIORef False->True, both
>>                         succeed
>>                       * 120 threads try to read, increment, CAS until
>>                         they succeed.  The total is often not 120
>>                         because multiple threads think the successfully
>>                         incremented, say, 33->34.
>>
>>                     Here's a specific recipe for the latter test on GHC
>>                     7.6.3 Mac or Linux:
>>
>>                     *git clone
>>                     git at github.com:rrnewton/haskell-lockfree-queue.git
>>                     *
>>                     *cd haskell-lockfree-queue/AtomicPrimops/*
>>                     *git checkout 1a1e7e55f6706f9e5754
>>                     *
>>                     *cabal sandbox init
>>                     *
>>                     *cabal install -f-withTH -fforeign ./ ./testing
>>                     --enable-tests
>>                     *
>>                     *./testing/dist/dist-sandbox-*
>> /build/test-atomic-primops/test-atomic-primops
>>                     -t n_threads
>>                     *
>>
>>                     You may have to run the last line several times to
>>                     see the failure.
>>
>>                     Best,
>>                        -Ryan
>>
>>                     [1] I guess the __sync_bool_compare_and_swap
>>                     intrinsic which reads ZF is there just to avoid the
>>                     extra comparison.
>>
>>                     [2] P.S. I'd like to try this on GHC head, but the
>>                     RHEL 6 machine I usually use to build it is
>>                     currently not validating (below error, commit
>>                     65d05d7334).  After I debug this gmp problem I'll
>>                     confirm that the bug under discussion applies on the
>>                     7.8 branch.
>>
>>                          ./sync-all checkout ghc-7.8
>>                          sh validate
>>                     ...
>>                     /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o:
>>                     relocation R_X86_64_32 against `__gmpz_sub' can not
>>                     be used when making a shared object; recompile with
>>                     -fPIC
>>                     libraries/integer-gmp/gmp/objs/aors.o: could not
>>                     read symbols: Bad value
>>
>>
>>
>>
>>                     On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald
>>                     <carter.schonwald at gmail.com
>>                     <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                         Ryan, is your benchmark using CAS on pointers,
>>                         or immediate words? trying to get atomic primops
>>                         to build on my 7.8 build on my mac
>>
>>
>>                         On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald
>>                         <carter.schonwald at gmail.com
>>                         <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                             https://ghc.haskell.org/trac/
>> ghc/ticket/8724#ticket
>>                             is the ticket
>>
>>                             when i'm more awake i'll experiment some more
>>
>>
>>                             On Sat, Feb 1, 2014 at 2:33 AM, Carter
>>                             Schonwald <carter.schonwald at gmail.com
>>                             <mailto:carter.schonwald at gmail.com>> wrote:
>>
>>                                 i have a ticket for tracking this,
>>                                 though i'm thinking my initial attempt
>>                                 at a patch generates the same object
>>                                 code as it did before.
>>
>>                                 @ryan, what CPU variant are you testing
>>                                 this on? is this on a NUMA machine or
>>                                 something?
>>
>>
>>                                 On Sat, Feb 1, 2014 at 1:58 AM, Carter
>>                                 Schonwald <carter.schonwald at gmail.com
>>                                 <mailto:carter.schonwald at gmail.com>>
>> wrote:
>>
>>                                     woops, i mean cmpxchgq
>>
>>
>>                                     On Sat, Feb 1, 2014 at 1:36 AM,
>>                                     Carter Schonwald
>>                                     <carter.schonwald at gmail.com
>>                                     <mailto:carter.schonwald at gmail.com>>
>>                                     wrote:
>>
>>                                         ok, i can confirm that on my
>>                                         64bit mac, both clang and gcc
>>                                         use cmpxchgl rather than cmpxchg
>>                                         i'll whip up a strawman patch on
>>                                         head that can be cherrypicked /
>>                                         tested out by ryan et al
>>
>>
>>                                         On Sat, Feb 1, 2014 at 1:12 AM,
>>                                         Carter Schonwald
>>                                         <carter.schonwald at gmail.com
>>                                         <mailto:carter.schonwald@
>> gmail.com>>
>>                                         wrote:
>>
>>                                             Hey Ryan,
>>                                             looking at this closely
>>                                             Why isn't CAS using
>>                                             CMPXCHG8B on 64bit
>>                                             architectures?  Could that
>>                                             be the culprit?
>>
>>                                             Could the issue be that
>>                                             we've not had a good stress
>>                                             test that would create
>>                                             values that are equal on the
>>                                             32bit range, but differ on
>>                                             the 64bit range, and you're
>>                                             hitting that?
>>
>>                                             Could you try seeing if
>>                                             doing that change fixes
>>                                             things up?
>>                                             (I may be completely wrong,
>>                                             but just throwing this out
>>                                             as a naive "obvious" guess)
>>
>>
>>                                             On Sat, Feb 1, 2014 at 12:58
>>                                             AM, Ryan Newton
>>                                             <rrnewton at gmail.com
>>                                             <mailto:rrnewton at gmail.com>>
>>                                             wrote:
>>
>>                                                 Then again... I'm having
>>                                                 trouble seeing how the
>>                                                 spec on page 3-149 of
>>                                                 the Intel manual would
>>                                                 allow the behavior I'm
>>                                                 seeing:
>>
>>
>> http://www.intel.com/content/dam/www/public/us/en/
>> documents/manuals/64-ia-32-architectures-software-
>> developer-manual-325462.pdf
>>
>>                                                 Nevertheless, this is
>>                                                 exactly the behavior
>>                                                 we're seeing with the
>>                                                 current Haskell primops.
>>                                                   Two threads
>>                                                 simultaneously
>>                                                 performing the same
>>                                                 CAS(p,a,b) can both
>>                                                 think that they succeeded.
>>
>>
>>
>>
>>
>>                                                 On Sat, Feb 1, 2014 at
>>                                                 12:33 AM, Ryan Newton
>>                                                 <rrnewton at gmail.com
>>                                                 <mailto:
>> rrnewton at gmail.com>>
>>                                                 wrote:
>>
>>                                                     I commented on the
>>                                                     commit here:
>>
>>
>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0
>> 626ea6f36b
>>
>>                                                     The problem is that
>>                                                     our "cas" routine in
>>                                                     SMP.h is similar to
>>                                                     the C compiler
>>                                                     intrinsic
>>
>> __sync_val_compare_and_swap,
>>                                                     in that it returns
>>                                                     the old value.  But
>>                                                     it seems we cannot
>>                                                     use a comparison
>>                                                     against that old
>>                                                     value to determine
>>                                                     whether or not the
>>                                                     CAS succeeded.  (I
>>                                                     believe the CAS may
>>                                                     fail due to
>>                                                     contention, but the
>>                                                     old value may happen
>>                                                     to look like our old
>>                                                     value.)
>>
>>                                                     Unfortunately, this
>>                                                     didn't occur to me
>>                                                     until it started
>>                                                     causing bugs [1]
>>                                                     [2].  Fixing
>>                                                     casMutVar# fixes
>>                                                     these bugs.
>>                                                       However, the way
>>                                                     I'm currently fixing
>>                                                     CAS in the
>>                                                     "atomic-primops"
>>                                                     package is by using
>>
>> __sync_bool_compare_and_swap:
>>
>>
>> https://github.com/rrnewton/haskell-lockfree/commit/
>> f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-
>> be3304b3ecdd8e1f9ed316cd844d711aR200
>>
>>                                                     What is the best fix
>>                                                     for GHC itself?
>>                                                     Would it be ok for
>>                                                     GHC to include a C
>>                                                     compiler intrinsic
>>                                                     like
>>
>> __sync_val_compare_and_swap?
>>                                                       Otherwise we need
>>                                                     another big ifdbef'd
>>                                                     function like "cas"
>>                                                     in SMP.h that has
>>                                                     the
>>                                                     architecture-specific
>> inline
>>                                                     asm across all
>>                                                     architectures.  I
>>                                                     can write the x86
>>                                                     one, but I'm not
>>                                                     eager to try the
>> others.
>>
>>                                                     Best,
>>                                                         -Ryan
>>
>>                                                     [1]
>>
>> https://github.com/iu-parfunc/lvars/issues/70
>>                                                     [2]
>>
>> https://github.com/rrnewton/haskell-lockfree/issues/15
>>
>>
>>
>>
>> _______________________________________________
>>                                                 ghc-devs mailing list
>>                                                 ghc-devs at haskell.org
>>                                                 <mailto:
>> ghc-devs at haskell.org>
>>                                                 http://www.haskell.org/
>> mailman/listinfo/ghc-devs
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140206/be097c43/attachment-0001.html>