Re: Optimizing locking with MVars

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing locking with MVars

Simon Marlow-5
This is interesting, thanks.

I propose to add INLINE pragmas to withMVar and friends.

Having an interface for simple locks sounds like a good idea to me.
Would you like to send a patch?

This won't affect Handle I/O unfortunately, because we need block to
protect against asynchronous exceptions.  I'm still not certain you
won't need that in the stream library, too: check any stateful code (eg.
buffering) and imagine what happens if an exception is raised at an
arbitrary point.

Cheers,
        Simon

Bulat Ziganshin wrote:

> Main reason of slowness of existing Handle-based I/O in GHC is locking
> around each operation. it is especially bad for simple char-at-a-time
> I/O where 99% of time spent on locking and unlocking.
>
> To be exact, on my CPU, hPutChar for 100mb file requires 150 seconds,
> while hGetChar for the same file is "only" 100 seconds long. it seems
> that former use 3 locking operations and later 2 ones, because my own
> vGetChar/vPutChar implementations both requires 52 seconds, of those
> only about one second is real work and rest is just `withMVar`
> expenses.
>
> Until now, i thought that this 0.5 ms (about 1000 primitive CPU
> operations) on each withMVar is pure time required to perform
> takeMVar+putMVar operations. But yesterday i investigated this problem
> deeper and results was amazing!
>
> First, i just made local copy of `withMVar` and added INLINE to it:
>
> import Control.Exception as Exception
> {-# INLINE inlinedWithMVar #-}
> inlinedWithMVar :: MVar a -> (a -> IO b) -> IO b
> inlinedWithMVar m io =
>   block $ do
>     a <- takeMVar m
>     b <- Exception.catch (unblock (io a))
>             (\e -> do putMVar m a; throw e)
>     putMVar m a
>     return b
>
>
> Second, i've developed my own simplified version of this procedure.
> Here i should say that my library uses "MVar ()" field to hold lock
> and separate immutable data field with actual data locked:
>
> data WithLocking h = WithLocking h !(MVar ())
>
> This allowed me to omit block/unblock operation and develop the
> following faster analog of withMVar:
>
> lock (WithLocking h mvar) action = do
>       Exception.catch (do takeMVar mvar
>                            result <- action h                  
>                            putMVar mvar ()                      
>                            return res                          
>                       )                                        
>                       (\e -> do tryPutMVar mvar (); throw e)    
>
> And as third variant i tried exception-unsafe variant of `withMVar`:
>
> unsafeWithMVar :: MVar a -> (a -> IO b) -> IO b
> unsafeWithMVar m io = do
>     a <- takeMVar m
>     b <- io a
>     putMVar m a
>     return b
>
>
> And now are results:
>
> withMVar        52 seconds
> inlinedWithMVar 38 seconds
> lock            20 seconds
> unsafeWithMVar  10 seconds
>
>
> So,
>
> 1) `withMVar` can be made significantly faster just by attaching
> INLINE pragma to it. until GHC includes this patch, you can just make
> local copy of this procedure (it's implementation is
> compiler-independent) and use INLINE pragma for this local copy
>
> 2) if MVar is used only to protect some immutable data from
> simultaneous access, it's use can be made significantly faster by
> using above-mentioned WithLocking type constructor together with
> 'lock' function. I hope that this mechanism will go into future
> Haskell implementations and in particular it will be used in my own
> Streams library and in new DiffArray implementation (that is a part
> of ArrayRef library)
>
> 3) For simple programs that don't catch exceptions anyway, this
> excessive protection is just meaningless. they can use
> 'unsafeWithMVar' to work as fast as possible. i mean in particular
> shootout-like benchmarks. it is also possible to develop fast & safe
> routines by using explicit unlocking (with 'tryPutMVar') in
> higher-level exception handlers
>
>
> and a more general conclusion. this case is a good demonstration of
> significant performance loss due to using of higher-order functions. i
> think that more aggressive inlining of high-order and polymorphic
> functions should significantly speed up GHC-compiled programs.
>
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Optimizing locking with MVars

John Meacham
On Wed, May 03, 2006 at 12:07:19PM +0100, Simon Marlow wrote:
> This won't affect Handle I/O unfortunately, because we need block to
> protect against asynchronous exceptions.  I'm still not certain you
> won't need that in the stream library, too: check any stateful code (eg.
> buffering) and imagine what happens if an exception is raised at an
> arbitrary point.

Is unlocking the lock really the right thing to do on an asynchronous
exception? A lock isn't a resource, it is a primitive needed to enforce
correctness of your program. You use them to protect critical sections
and chances are aborting a critical section at an arbitrary point would
leave your program in an incorrect state, just delaying your deadlock or
hiding the errors silently somewhere where they can bite you later.

hmmm... ever think asynchronous exceptions are more trouble then they
are worth...

        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: Optimizing locking with MVars

Bulat Ziganshin-2
Hello John,

Thursday, May 4, 2006, 12:33:54 AM, you wrote:

>> This won't affect Handle I/O unfortunately, because we need block to
>> protect against asynchronous exceptions.  I'm still not certain you
>> won't need that in the stream library, too: check any stateful code (eg.
>> buffering) and imagine what happens if an exception is raised at an
>> arbitrary point.

> Is unlocking the lock really the right thing to do on an asynchronous
> exception? A lock isn't a resource, it is a primitive needed to enforce
> correctness of your program. You use them to protect critical sections
> and chances are aborting a critical section at an arbitrary point would
> leave your program in an incorrect state, just delaying your deadlock or
> hiding the errors silently somewhere where they can bite you later.

after Simon's message i thought about this problem. i found several
situations where "restoring" of locked file will be useful:

- using stdout and other standard handles. we may need to print error
message or just continue work despite the exception abandoned our
previous writing to stdout

- access to database. despite the exception arrived during previous
operation, we need to go further and just hSeek to the position of
next I/O operation



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing locking with MVars

Simon Marlow-5
In reply to this post by John Meacham
John Meacham wrote:

> On Wed, May 03, 2006 at 12:07:19PM +0100, Simon Marlow wrote:
>
>>This won't affect Handle I/O unfortunately, because we need block to
>>protect against asynchronous exceptions.  I'm still not certain you
>>won't need that in the stream library, too: check any stateful code (eg..
>>buffering) and imagine what happens if an exception is raised at an
>>arbitrary point.
>
>
> Is unlocking the lock really the right thing to do on an asynchronous
> exception? A lock isn't a resource, it is a primitive needed to enforce
> correctness of your program. You use them to protect critical sections
> and chances are aborting a critical section at an arbitrary point would
> leave your program in an incorrect state, just delaying your deadlock or
> hiding the errors silently somewhere where they can bite you later.

Quite right, which is why in GHC's IO library we use "block" around most
(all?) Handle operations so that we don't receive asynchronous exceptions.

the main uses of block are:

  - acquiring a resource atomically (eg. withMVar)

  - maintaining invariants of shared state

STM gives you an alternative way to do both of these, but you still
sometimes need 'block' for sequences of IO operations that you don't
want to be interrupted.

> hmmm... ever think asynchronous exceptions are more trouble then they
> are worth...

Sometimes, but I haven't found a better alternative, and asynchronous
exceptions in GHC are *far* easier to handle than in other languages.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing locking with MVars

Simon Marlow-5
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> after Simon's message i thought about this problem. i found several
> situations where "restoring" of locked file will be useful:
>
> - using stdout and other standard handles. we may need to print error
> message or just continue work despite the exception abandoned our
> previous writing to stdout
>
> - access to database. despite the exception arrived during previous
> operation, we need to go further and just hSeek to the position of
> next I/O operation

Certainly the stream should be resilient to exceptions: if an exception
occurs during an operation, that shouldn't prevent subsequent operations
from proceeding as normal.  Asynchronous exceptions should behave just
like synchronous ones in this respect.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Optimizing locking with MVars

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Wednesday, May 3, 2006, 3:07:19 PM, you wrote:

> I propose to add INLINE pragmas to withMVar and friends.

and to any higher-order function? :)  imho, key of this problem is
that GHC don't have a way to optimize just the required function with
all it's enclosed calls. in old GHC papers there is an "inline"
function that does this (afai understood) but it is no more supported

problem is not that withMVar should be inlined in any case of its use.
problem is what i need to inline it's use in one concrete place of my
concrete library and there is no way to force GHC do this beside of
copying it's implementation! i think that GHC should just give more
priority to inlining higher-order and dictionaries-dependent functions
(afaiu, currently they are weighted by the same rules as ordinary
functions)

and the second part of problem is what it can't be decided at the
place of "withMVar" definition whether it should be inlined or not -
this depends in the first place on the call site. if this is the code
that performed just one time, we should decide on the space basis. if
this is the innermost loop, we should inline even largest functions
(especially if they are higher-order/polymorphic ones). but the
compiler don't know how much times some code fragment will be
performed (obviously it's place for future integration between
optimizer and profiler...) so, i think, it should generally optimize
program on code space and only for procedures marked as OPTIMIZE it
should enable all the inlining machinery, trying to make code
generated for this procedure (including all the inner calls) as fast
as possible. this should allow programmer to just mark speed-critical
procedures and will make library writers absolutely free from dancing
with INLINE pragmas.

while currently to optimize some procedure in my application, i mark
it INLINE, then find and mark all procedures it call, and further
recursively. i prefer that GHC will do it instead of me :)

> Having an interface for simple locks sounds like a good idea to me.
> Would you like to send a patch?

i've attached this module together with two examples on that i've
tested it. first example (a.hs) really don't work :)  i think it's
shortcoming of current Haskell standard (at least, Hugs also can't
compile it). moreover, it seems that we should change MArray
declaration to the following:

class (Monad m, MutableBounds a m, Ix i) => MArray a e m | a->e, a->i where
    newArray    :: (i,i) -> e -> m a
    newArray_   :: (i,i) -> m a
    unsafeRead  :: a -> Int -> m e
    unsafeWrite :: a -> Int -> e -> m ()

(i.e. 'a' here may be for example "IOArray Int Double" while it's just
"IOArray" in current definition)

drawback is what this needs FD (while old definition use only MPTC)
and what this new definition will be not compatible with some old
code. on the other side, as you can see here, using of type
constructor in class definition tend to create problem with
partially-applied type functions, i've encountered this problem
several times while redesigning array library, but before this moment
each time i found some way around problem. but the problem still
exists...


> This won't affect Handle I/O unfortunately, because we need block to
> protect against asynchronous exceptions.

i think that it is not accidental that hPutChar/hGetChar is exactly 3
and 2 times slower than their v* analogs. my routines use only one
locked MVar and i think that Handle routines used 3 and 2 mvars,
respectively. i thought that they call withMVar directly, but it is
not the case. but may be great blocks that they are perform inside
"block" operations are still not inlined and therefore suffers from
the same problem as withMVar? i guess that adding INLINE pragma to
block definition may speed up their execution (in 52/38 ~= 1.4 times),
but possibly at the cost of code size. anyway, Handle I/O is already too
complicated thing to worry about it's further optimizations :(

> I'm still not certain you
> won't need that in the stream library, too: check any stateful code (eg.
> buffering) and imagine what happens if an exception is raised at an
> arbitrary point.

thank, it's sort of critics what i very need. the total checking seems
to rather complex problem so i will decline it until working on next
release. the good news is what my buffering scheme updates only one
unboxed pointer (current read/write position inside buffer) on all
operations that can be performed using only buffer contents, so it
seems that i don't need to use "block" to protect such operations
until routine realizes that there is need to switch to other buffer.
btw, i've changed default buffer size to 4kb

in next version i will work on this problem. i think that it should be
described as some contract between library authors (including authors
of 3rd-party streams and transformers) and users, in this form:

if stream operation abandoned because of asynchronous interrupt,
internal bug or error in input data:

- it should leave stream in coherent state (f.e., it should not leave
vIsEOF=True and vTell=0 for non-empty-file)

- it should not have "side-effects", f.e. changing position if whole
operation (such as vShow) don't have such effect

- operation can be performed only partially (i.e. vSeek operation can
leave pointer in unpredictable place, vPutStr can output only part of
string, vGetBuf may fill only part of buffer)

- if possible, operation implementation should give to async.
exceptions chances to interrupt execution with rather small
granularity (i.e. don't write 100 mb in one C call, but split it to 4
kb blocks)


what you will say?

--
Best regards,
 Bulat                            mailto:[hidden email]
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Locking.hs (8K) Download Attachment
b.hs (576 bytes) Download Attachment
a.hs (832 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing locking with MVars

Simon Marlow-5
Bulat Ziganshin wrote:

> Wednesday, May 3, 2006, 3:07:19 PM, you wrote:
>
>
>>I propose to add INLINE pragmas to withMVar and friends.
>
>
> and to any higher-order function? :)  imho, key of this problem is
> that GHC don't have a way to optimize just the required function with
> all it's enclosed calls. in old GHC papers there is an "inline"
> function that does this (afai understood) but it is no more supported
>
> problem is not that withMVar should be inlined in any case of its use.
> problem is what i need to inline it's use in one concrete place of my
> concrete library and there is no way to force GHC do this beside of
> copying it's implementation! i think that GHC should just give more
> priority to inlining higher-order and dictionaries-dependent functions
> (afaiu, currently they are weighted by the same rules as ordinary
> functions)

Giving the programmer more control over which call sites are inlined
would be a good thing.  However, it only improves code size, not
performance, compared with just using INLINE.

GHC has a complicated heuristic to decide whether to inline a function
or not, taking into account whether the arguments look "interesting" or
not.  An interesting argument valiue (eg. a function or a constructor)
might lead to more transformations when the function is inlined, so the
decision is weighted in favour of that.

See the inlining paper:
   http://www.haskell.org/~simonmar/papers/inline.pdf

There are definitely cases where GHC doesn't inline and it would be
beneficial to do so.  I'd just like to point out that the current
heuristics weren't arrived at by accident - we did lots of measurements
of the effect of different values for the tuning parameters on the nofib
benchmark suite, and the result is a reasonable compromise between
performance and code size.

> and the second part of problem is what it can't be decided at the
> place of "withMVar" definition whether it should be inlined or not -
> this depends in the first place on the call site. if this is the code
> that performed just one time, we should decide on the space basis. if
> this is the innermost loop, we should inline even largest functions
> (especially if they are higher-order/polymorphic ones). but the
> compiler don't know how much times some code fragment will be
> performed (obviously it's place for future integration between
> optimizer and profiler...) so, i think, it should generally optimize
> program on code space and only for procedures marked as OPTIMIZE it
> should enable all the inlining machinery, trying to make code
> generated for this procedure (including all the inner calls) as fast
> as possible. this should allow programmer to just mark speed-critical
> procedures and will make library writers absolutely free from dancing
> with INLINE pragmas.

annotating code fragments for "optimise readly hard" would be a good
thing, yes.  Of course, you can always put those code fragments in a
separate module, and compile it with -O2 -funfolding-use-threshold100
-fliberate-case-threshold100 etc.



>>This won't affect Handle I/O unfortunately, because we need block to
>>protect against asynchronous exceptions.
>
>
> i think that it is not accidental that hPutChar/hGetChar is exactly 3
> and 2 times slower than their v* analogs. my routines use only one
> locked MVar and i think that Handle routines used 3 and 2 mvars,
> respectively. i thought that they call withMVar directly, but it is
> not the case. but may be great blocks that they are perform inside
> "block" operations are still not inlined and therefore suffers from
> the same problem as withMVar? i guess that adding INLINE pragma to
> block definition may speed up their execution (in 52/38 ~= 1.4 times),
> but possibly at the cost of code size. anyway, Handle I/O is already too
> complicated thing to worry about it's further optimizations :(

'block' is trivial, it is already inlined.

block (IO io) = IO $ blockAsyncExceptions# io
unblock (IO io) = IO $ unblockAsyncExceptions# io

> thank, it's sort of critics what i very need. the total checking seems
> to rather complex problem so i will decline it until working on next
> release. the good news is what my buffering scheme updates only one
> unboxed pointer (current read/write position inside buffer) on all
> operations that can be performed using only buffer contents, so it
> seems that i don't need to use "block" to protect such operations
> until routine realizes that there is need to switch to other buffer.
> btw, i've changed default buffer size to 4kb
>
> in next version i will work on this problem. i think that it should be
> described as some contract between library authors (including authors
> of 3rd-party streams and transformers) and users, in this form:
>
> if stream operation abandoned because of asynchronous interrupt,
> internal bug or error in input data:
>
> - it should leave stream in coherent state (f.e., it should not leave
> vIsEOF=True and vTell=0 for non-empty-file)
>
> - it should not have "side-effects", f.e. changing position if whole
> operation (such as vShow) don't have such effect
>
> - operation can be performed only partially (i.e. vSeek operation can
> leave pointer in unpredictable place, vPutStr can output only part of
> string, vGetBuf may fill only part of buffer)
>
> - if possible, operation implementation should give to async.
> exceptions chances to interrupt execution with rather small
> granularity (i.e. don't write 100 mb in one C call, but split it to 4
> kb blocks)
>
> what you will say?

I think you can give stronger guarantees than these.  eg.  Seek either
succeeds or it doesn't - if it raises an exception (including an async
exception) the seek never happened.  Partial reads/writes before an
exception are unavoidable, though.  Unfortunately the caller will have
no way to tell how much of the operation completed before the exception
was raised, unless you provide a secondary call to obtain this
information, or pass a mutable counter to the operation.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe