Optimising words

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimising words

Neil Mitchell
Hi

While benchmarking a word count program I found that it wasn't running
as fast as it could. I traced this back to the original definition of
words, which isn't as good as it could be:

This is the version in Base:

words :: String -> [String]
words s =  case dropWhile isSpace s of
                                "" -> []
                                s' -> w : words s''
                                      where (w, s'') =
                                             break isSpace s'

We can instantiate s' with s:ss, since we already pattern match:

words :: String -> [String]
words s =  case dropWhile isSpace s of
                                "" -> []
                                s:ss -> w : words s''
                                      where (w, s'') =
                                             break isSpace (s:ss)

Now we know that s is not a space, since if it was the dropWhile would
have removed it. This means that we don't need to test it, and can put
it straight on to w:

words :: String -> [String]
words s =  case dropWhile isSpace s of
                                "" -> []
                                s:ss -> (s:w) : words s''
                                      where (w, s'') = break isSpace ss

Now we have eliminated an extra isSpace test per character, and as it
happens, isSpace is very slow.

Is my reasoning correct? If so, can we make this optimisation?

Thanks

Neil
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Neil Mitchell
Hi

> An optimised version of words:
>
> words                   :: String -> [String]
> words s                 =  case dropWhile isSpace s of
>                                 "" -> []
>                                 s:ss -> (s:w) : words s''
>                                       where (w, s'') = break isSpace ss

Thinking harder, this is an improvement, but not as good as we can
get. If s'' is non-empty, then the first character must be a space,
which we can drop without testing:

words                   :: String -> [String]
words s                 =  case dropWhile isSpace s of
                                "" -> []
                                s:ss -> (s:w) : words (drop1 s'')
                                      where (w, s'') = break isSpace ss

drop1 [] = []
drop1 (x:xs) = xs

(of course, drop1 = drop 1, but if we're going for maximum efficience :-) )

I think this now ensures that we only test each character once for
being a space.

Thanks

Neil
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Duncan Coutts
In reply to this post by Neil Mitchell
On Mon, 2007-07-09 at 16:10 +0100, Neil Mitchell wrote:
> Hi
>
> While benchmarking a word count program I found that it wasn't running
> as fast as it could. I traced this back to the original definition of
> words, which isn't as good as it could be:

[...]

> Is my reasoning correct? If so, can we make this optimisation?

To really convince yourself and everyone else you could compare it
against the spec, for both total and partial values. For our list
library re-implementation we used SmallCheck and a modified version of
SmallCheck for checking partial values. It turned up lots of bugs in our
code and found several instances where the current base implementations
are not the same as the spec (for various good reasons).

http://www.cse.unsw.edu.au/~dons/code/streams/list/tests/Strictness/BaseVsSpec.hs

Duncan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Neil Mitchell
Hi

Actually, you can go slightly better:

words s     = case dropWhile isSpace s of
                  [] -> []
                  (s:ss) -> (s:w) : drop1 sss
                      where
                          (w, sss) = break isSpace ss

                          drop1 [] = []
                          drop1 (x:xs) = words xs

Although a good optimising compiler may make this last leap all on its own.

> To really convince yourself and everyone else you could compare it
> against the spec, for both total and partial values.

I'm pretty convinced that I applied reasoning rules at each stage. If
you have a proof, testing is irrelevant :-)

I've already modified the Yhc base library with this optimisation, and
done some basic testing, and nothing broke.

Thanks

Neil
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Duncan Coutts
On Tue, 2007-07-10 at 00:06 +0100, Neil Mitchell wrote:

> > To really convince yourself and everyone else you could compare it
> > against the spec, for both total and partial values.
>
> I'm pretty convinced that I applied reasoning rules at each stage. If
> you have a proof, testing is irrelevant :-)

Ok, if you're sure. :-)

> I've already modified the Yhc base library with this optimisation, and
> done some basic testing, and nothing broke.

I'm more hesitant because I've already rewritten words once and got it
wrong. What's more it passed all the casual testing and quickcheck tests
I used. That's because it was correct for total values, it only went
wrong for partial values. It wasn't until I developed the SmallCheck on
partial value tests that I discovered the bug.

Duncan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Donald Bruce Stewart
In reply to this post by Neil Mitchell
ndmitchell:

> Hi
>
> Actually, you can go slightly better:
>
> words s     = case dropWhile isSpace s of
>                  [] -> []
>                  (s:ss) -> (s:w) : drop1 sss
>                      where
>                          (w, sss) = break isSpace ss
>
>                          drop1 [] = []
>                          drop1 (x:xs) = words xs
>
> Although a good optimising compiler may make this last leap all on its own.
>
> >To really convince yourself and everyone else you could compare it
> >against the spec, for both total and partial values.
>
> I'm pretty convinced that I applied reasoning rules at each stage. If
> you have a proof, testing is irrelevant :-)
>
> I've already modified the Yhc base library with this optimisation, and
> done some basic testing, and nothing broke.

I'd second Duncan here -- strictness properties are *really* hard, so
changes to existing (H98) functions must come with both QuickCheck and
strictness checking properties, before I'd be comfortable accepting them.

The missing support for partial values in QuickCheck is quite a hole
that needs fixing.

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Optimising words

Neil Mitchell
In reply to this post by Neil Mitchell
Hi

> > Now we have eliminated an extra isSpace test per character, and as it
> > happens, isSpace is very slow.
>
> Isn't it possible to improve isSpace, too?

Yes, most definitely. But unless we improve it to the stage where it
takes 0 time, its still worth eliminating useless additional calls.

Thanks

Neil
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries