Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
Currently, isSuffixOf is defined like this:

xs `isSuffixOf` ys = reverse xs `isPrefixOf` reverse ys

If ys is much longer than xs, this is not so wonderful, because we have to reverse the whole spine of ys. It also will fail whenever either xs *or* ys is infinite. The following implementation seems a lot saner (although longer):

isSuffixOf              :: forall a . (Eq a) => [a] -> [a] -> Bool
[]     `isSuffixOf` _   =  True
needle `isSuffixOf` hay =
      maybe False
            (\fp -> needle `eq` getEndChunk hay fp)
            (getFrontPointer needle hay)
  where
    eq :: [a] -> [a] -> [a]
    (x:xs) `eq` (y:ys) = x==y && (xs `eq` ys)

    getEndChunk :: [a] -> [a] -> [a]
    getEndChunk (_:hy) (_:fp) = getEndChunk hy fp
    getEndChunk hy     _      = hy

    getFrontPointer :: [a] -> [a] -> Maybe [a]
    getFrontPointer [] hy         = Just hy
    getFrontPointer _  []         = Nothing
    getFrontPointer (_:nd) (_:hy) = getFrontPointer nd hy

This doesn't do any of that crazy stuff, and it will work just fine if the needle is infinite. The only slightly sticky point is that it's not *strictly* lazier. In particular,

[1,2] `Data.List.isSuffixOf` [undefined, 7] = False
[1,2] `isSuffixOf` [undefined, 7] = undefined

But note that

[1,2] `Data.List.isSuffixOf` [7,undefined] = undefined
[1,2] `isSuffixOf` [7, undefined] = False

It would be possible to get the same kind of laziness at the end using something structured as above, but it requires some unpleasantness: instead of using

needle `eq` getEndChunk hay fp

we'd have to use

needle `backwardsEq` getEndChunk hay fp

(x:xs) `backwardsEq` (y:ys) = (xs `backwardsEq` ys) && x==y

This is yucky. My guess is that no one is relying on the current behavior, but I'd like to make sure everyone's okay with this.

David

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
Sorry, I made a bit of a mistake with eq; it's corrected below.

On Wed, Oct 8, 2014 at 4:15 AM, David Feuer <[hidden email]> wrote:
Currently, isSuffixOf is defined like this:

xs `isSuffixOf` ys = reverse xs `isPrefixOf` reverse ys

If ys is much longer than xs, this is not so wonderful, because we have to reverse the whole spine of ys. It also will fail whenever either xs *or* ys is infinite. The following implementation seems a lot saner (although longer):

isSuffixOf              :: forall a . (Eq a) => [a] -> [a] -> Bool
[]     `isSuffixOf` _   =  True
needle `isSuffixOf` hay =
      maybe False
            (\fp -> needle `eq` getEndChunk hay fp)
            (getFrontPointer needle hay)
  where
    eq :: [a] -> [a] -> Bool
    (x:xs) `eq` (y:ys) = x==y && (xs `eq` ys)
        _         `eq` _        = True

    getEndChunk :: [a] -> [a] -> [a]
    getEndChunk (_:hy) (_:fp) = getEndChunk hy fp
    getEndChunk hy     _      = hy

    getFrontPointer :: [a] -> [a] -> Maybe [a]
    getFrontPointer [] hy         = Just hy
    getFrontPointer _  []         = Nothing
    getFrontPointer (_:nd) (_:hy) = getFrontPointer nd hy

This doesn't do any of that crazy stuff, and it will work just fine if the needle is infinite. The only slightly sticky point is that it's not *strictly* lazier. In particular,

[1,2] `Data.List.isSuffixOf` [undefined, 7] = False
[1,2] `isSuffixOf` [undefined, 7] = undefined

But note that

[1,2] `Data.List.isSuffixOf` [7,undefined] = undefined
[1,2] `isSuffixOf` [7, undefined] = False

It would be possible to get the same kind of laziness at the end using something structured as above, but it requires some unpleasantness: instead of using

needle `eq` getEndChunk hay fp

we'd have to use

needle `backwardsEq` getEndChunk hay fp

(x:xs) `backwardsEq` (y:ys) = (xs `backwardsEq` ys) && x==y

This is yucky. My guess is that no one is relying on the current behavior, but I'd like to make sure everyone's okay with this.

David


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer

A slight simplification (because I realized the weird thing I was doing won't actually save any time):

isSuffixOf              :: forall a . (Eq a) => [a] -> [a] -> Bool
[]     `isSuffixOf` _   =  True
needle `isSuffixOf` hay =
      maybe False
            (\fp -> needle == getEndChunk hay fp)
            (getFrontPointer needle hay)
  where
    getEndChunk :: [a] -> [a] -> [a]
    getEndChunk (_:hy) (_:fp) = getEndChunk hy fp
    getEndChunk hy     _      = hy

    getFrontPointer :: [a] -> [a] -> Maybe [a]
    getFrontPointer [] hy         = Just hy
    getFrontPointer _  []         = Nothing
    getFrontPointer (_:nd) (_:hy) = getFrontPointer nd hy

I still haven't heard from anyone about the slight semantic change. To clarify it slightly, the only aspect that could theoretically be a problem for somebody is that this reverses the order in which the elements of the "needle" are compared to the (length needle) elements at the end of the "haystack". The current implementation compares them from back to front; this one compares them from front to back. That means that if some elements in the needle or near the end of the haystack are undefined, there are cases where this implementation bottoms out when the current one returns False, and vice versa.

On Oct 8, 2014 4:17 AM, "David Feuer" <[hidden email]> wrote:
Sorry, I made a bit of a mistake with eq; it's corrected below.

On Wed, Oct 8, 2014 at 4:15 AM, David Feuer <[hidden email]> wrote:
Currently, isSuffixOf is defined like this:

xs `isSuffixOf` ys = reverse xs `isPrefixOf` reverse ys

If ys is much longer than xs, this is not so wonderful, because we have to reverse the whole spine of ys. It also will fail whenever either xs *or* ys is infinite. The following implementation seems a lot saner (although longer):

isSuffixOf              :: forall a . (Eq a) => [a] -> [a] -> Bool
[]     `isSuffixOf` _   =  True
needle `isSuffixOf` hay =
      maybe False
            (\fp -> needle `eq` getEndChunk hay fp)
            (getFrontPointer needle hay)
  where
    eq :: [a] -> [a] -> Bool
    (x:xs) `eq` (y:ys) = x==y && (xs `eq` ys)
        _         `eq` _        = True

    getEndChunk :: [a] -> [a] -> [a]
    getEndChunk (_:hy) (_:fp) = getEndChunk hy fp
    getEndChunk hy     _      = hy

    getFrontPointer :: [a] -> [a] -> Maybe [a]
    getFrontPointer [] hy         = Just hy
    getFrontPointer _  []         = Nothing
    getFrontPointer (_:nd) (_:hy) = getFrontPointer nd hy

This doesn't do any of that crazy stuff, and it will work just fine if the needle is infinite. The only slightly sticky point is that it's not *strictly* lazier. In particular,

[1,2] `Data.List.isSuffixOf` [undefined, 7] = False
[1,2] `isSuffixOf` [undefined, 7] = undefined

But note that

[1,2] `Data.List.isSuffixOf` [7,undefined] = undefined
[1,2] `isSuffixOf` [7, undefined] = False

It would be possible to get the same kind of laziness at the end using something structured as above, but it requires some unpleasantness: instead of using

needle `eq` getEndChunk hay fp

we'd have to use

needle `backwardsEq` getEndChunk hay fp

(x:xs) `backwardsEq` (y:ys) = (xs `backwardsEq` ys) && x==y

This is yucky. My guess is that no one is relying on the current behavior, but I'd like to make sure everyone's okay with this.

David


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Jon Fairbairn
David Feuer <[hidden email]> writes:

> A slight simplification (because I realized the weird thing I was doing
> won't actually save any time):
>
> isSuffixOf              :: forall a . (Eq a) => [a] -> [a] -> Bool
> []     `isSuffixOf` _   =  True
> needle `isSuffixOf` hay =
>       maybe False
>             (\fp -> needle == getEndChunk hay fp)
>             (getFrontPointer needle hay)
>   where
>     getEndChunk :: [a] -> [a] -> [a]
>     getEndChunk (_:hy) (_:fp) = getEndChunk hy fp
>     getEndChunk hy     _      = hy
>
>     getFrontPointer :: [a] -> [a] -> Maybe [a]
>     getFrontPointer [] hy         = Just hy
>     getFrontPointer _  []         = Nothing
>     getFrontPointer (_:nd) (_:hy) = getFrontPointer nd hy

That seems rather wordy to me. Can we get anywhere starting with

   a `isSuffixOf` b = a `elem` tails b

and transforming? I’m not awake enough to the efficiency
implications, but


   a `isSuffixOf` b = a `elem` (take 1 $ dropWhile (longerThan a) $ tails b)

   longerThan _ [] = False
   longerThan [] _ = True
   longerThan (_:a) (_:b) = longerThan a b

looks plausible.

> I still haven't heard from anyone about the slight semantic change. To
> clarify it slightly, the only aspect that could theoretically be a problem
> for somebody is that this reverses the order in which the elements of the
> "needle" are compared to the (length needle) elements at the end of the
> "haystack". The current implementation compares them from back to front;
> this one compares them from front to back. That means that if some elements
> in the needle or near the end of the haystack are undefined, there are
> cases where this implementation bottoms out when the current one returns
> False, and vice versa.

This doesn’t bother me. Comparing forwards seems more a
“natural” expectation.

--
Jón Fairbairn                                 [hidden email]
http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html  (updated 2014-04-05)

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]> wrote:
That seems rather wordy to me. Can we get anywhere starting with


It is rather wordy; I just don't know how to do better.
 
   a `isSuffixOf` b = a `elem` tails b

and transforming? I’m not awake enough to the efficiency
implications, but


   a `isSuffixOf` b = a `elem` (take 1 $ dropWhile (longerThan a) $ tails b)

   longerThan _ [] = False
   longerThan [] _ = True
   longerThan (_:a) (_:b) = longerThan a b

looks plausible.

That looks bad. The implementation I gave is something like O(m+n), whereas yours looks something like O(m*n).

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Milan Straka
Hi all,

> -----Original message-----
> From: David Feuer <[hidden email]>
> Sent: 10 Oct 2014, 05:23
>
> On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]>
> wrote:
>
> > That seems rather wordy to me. Can we get anywhere starting with
> >
> >
> It is rather wordy; I just don't know how to do better.

What about something like the following? It uses exactly the same
algorithm, it just avoids the maybe combinator and Maybe internal
result, and renames stuff:

isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _ = True
needle `isSuffixOf` hay = subtract_and_compare needle hay
  where subtract_and_compare [] diff = skip_and_compare diff hay
        subtract_and_compare _ [] = False -- needle longer than hay
        subtract_and_compare (n:ns) (h:hs) = subtract_and_compare ns hs

        skip_and_compare [] suffix = needle == suffix
        skip_and_compare _ [] = True -- first argument must also be []
        skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss

The redundant skip_and_compare case is there just for completeness and
readability.

Cheers,
Milan
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

John Lato-2

I think David's proposed definition is the easiest to read form with good performance so far.

As to definedness, the change in semantics doesn't matter to me.  But I think calling isSuffixOf on possibly infinite lists is a bad idea anyway.

John L.

On Oct 10, 2014 5:35 AM, "Milan Straka" <[hidden email]> wrote:
>
> Hi all,
>
> > -----Original message-----
> > From: David Feuer <[hidden email]>
> > Sent: 10 Oct 2014, 05:23
> >
> > On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]>
> > wrote:
> >
> > > That seems rather wordy to me. Can we get anywhere starting with
> > >
> > >
> > It is rather wordy; I just don't know how to do better.
>
> What about something like the following? It uses exactly the same
> algorithm, it just avoids the maybe combinator and Maybe internal
> result, and renames stuff:
>
> isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
> [] `isSuffixOf` _ = True
> needle `isSuffixOf` hay = subtract_and_compare needle hay
>   where subtract_and_compare [] diff = skip_and_compare diff hay
>         subtract_and_compare _ [] = False -- needle longer than hay
>         subtract_and_compare (n:ns) (h:hs) = subtract_and_compare ns hs
>
>         skip_and_compare [] suffix = needle == suffix
>         skip_and_compare _ [] = True -- first argument must also be []
>         skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss
>
> The redundant skip_and_compare case is there just for completeness and
> readability.
>
> Cheers,
> Milan
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer

I wouldn't say calling it on a possibly infinite list would necessarily be the wisest thing, but supporting an infinite "needle" with a finite "haystack" comes for free with the forward traversal. Unlike inInfixOf, the extra implementation flexibility gained by specifying that the needle is finite has no serious potential to allow fancy algorithms to improve efficiency.

The real odd semantic duck, I think, is the special case for an empty needle, because it is lazier than one might expect. In particular, I would expect that isSuffixOf [] undefined would be False, because undefined is not a []-terminated list. Leaving *that* unspecified would make a lot of sense to me.

As for simplicity of form, orb__ came up with a version that uses Either instead of Maybe to take advantage of the similarity of two phases to get away with just one helper function, but I think that approach is much harder to understand.

On Oct 10, 2014 10:48 AM, "John Lato" <[hidden email]> wrote:

I think David's proposed definition is the easiest to read form with good performance so far.

As to definedness, the change in semantics doesn't matter to me.  But I think calling isSuffixOf on possibly infinite lists is a bad idea anyway.

John L.

On Oct 10, 2014 5:35 AM, "Milan Straka" <[hidden email]> wrote:
>
> Hi all,
>
> > -----Original message-----
> > From: David Feuer <[hidden email]>
> > Sent: 10 Oct 2014, 05:23
> >
> > On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]>
> > wrote:
> >
> > > That seems rather wordy to me. Can we get anywhere starting with
> > >
> > >
> > It is rather wordy; I just don't know how to do better.
>
> What about something like the following? It uses exactly the same
> algorithm, it just avoids the maybe combinator and Maybe internal
> result, and renames stuff:
>
> isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
> [] `isSuffixOf` _ = True
> needle `isSuffixOf` hay = subtract_and_compare needle hay
>   where subtract_and_compare [] diff = skip_and_compare diff hay
>         subtract_and_compare _ [] = False -- needle longer than hay
>         subtract_and_compare (n:ns) (h:hs) = subtract_and_compare ns hs
>
>         skip_and_compare [] suffix = needle == suffix
>         skip_and_compare _ [] = True -- first argument must also be []
>         skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss
>
> The redundant skip_and_compare case is there just for completeness and
> readability.
>
> Cheers,
> Milan
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer

I accidentally hit "send" before finishing this message. I *think* my version is probably a little easier to understand than Milan's but it also seems possible that Milan's could take the lead with some more inspired variable names. I also wish I knew a way to avoid the bogus pattern in skip_and_compare (and my equivalent), but I haven't been able to do so.

On Oct 10, 2014 12:23 PM, "David Feuer" <[hidden email]> wrote:

I wouldn't say calling it on a possibly infinite list would necessarily be the wisest thing, but supporting an infinite "needle" with a finite "haystack" comes for free with the forward traversal. Unlike inInfixOf, the extra implementation flexibility gained by specifying that the needle is finite has no serious potential to allow fancy algorithms to improve efficiency.

The real odd semantic duck, I think, is the special case for an empty needle, because it is lazier than one might expect. In particular, I would expect that isSuffixOf [] undefined would be False, because undefined is not a []-terminated list. Leaving *that* unspecified would make a lot of sense to me.

As for simplicity of form, orb__ came up with a version that uses Either instead of Maybe to take advantage of the similarity of two phases to get away with just one helper function, but I think that approach is much harder to understand.

On Oct 10, 2014 10:48 AM, "John Lato" <[hidden email]> wrote:

I think David's proposed definition is the easiest to read form with good performance so far.

As to definedness, the change in semantics doesn't matter to me.  But I think calling isSuffixOf on possibly infinite lists is a bad idea anyway.

John L.

On Oct 10, 2014 5:35 AM, "Milan Straka" <[hidden email]> wrote:
>
> Hi all,
>
> > -----Original message-----
> > From: David Feuer <[hidden email]>
> > Sent: 10 Oct 2014, 05:23
> >
> > On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]>
> > wrote:
> >
> > > That seems rather wordy to me. Can we get anywhere starting with
> > >
> > >
> > It is rather wordy; I just don't know how to do better.
>
> What about something like the following? It uses exactly the same
> algorithm, it just avoids the maybe combinator and Maybe internal
> result, and renames stuff:
>
> isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
> [] `isSuffixOf` _ = True
> needle `isSuffixOf` hay = subtract_and_compare needle hay
>   where subtract_and_compare [] diff = skip_and_compare diff hay
>         subtract_and_compare _ [] = False -- needle longer than hay
>         subtract_and_compare (n:ns) (h:hs) = subtract_and_compare ns hs
>
>         skip_and_compare [] suffix = needle == suffix
>         skip_and_compare _ [] = True -- first argument must also be []
>         skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss
>
> The redundant skip_and_compare case is there just for completeness and
> readability.
>
> Cheers,
> Milan
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Andreas Abel-2
David, the essence of your idea seems mutually drop the correct number
of elements from needle and hay and then compare for equality.  Here is
a concise implementation of your idea in terms of drop:

isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _  =  True
xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys

This can be further simplified to

isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _  =  True
xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys

which is a very easy to understand program, I think, without need to
reverse lists.

On 10.10.2014 18:39, David Feuer wrote:

> I accidentally hit "send" before finishing this message. I *think* my
> version is probably a little easier to understand than Milan's but it
> also seems possible that Milan's could take the lead with some more
> inspired variable names. I also wish I knew a way to avoid the bogus
> pattern in skip_and_compare (and my equivalent), but I haven't been able
> to do so.
>
> On Oct 10, 2014 12:23 PM, "David Feuer" <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I wouldn't say calling it on a possibly infinite list would
>     necessarily be the wisest thing, but supporting an infinite "needle"
>     with a finite "haystack" comes for free with the forward traversal.
>     Unlike inInfixOf, the extra implementation flexibility gained by
>     specifying that the needle is finite has no serious potential to
>     allow fancy algorithms to improve efficiency.
>
>     The real odd semantic duck, I think, is the special case for an
>     empty needle, because it is lazier than one might expect. In
>     particular, I would expect that isSuffixOf [] undefined would be
>     False, because undefined is not a []-terminated list. Leaving *that*
>     unspecified would make a lot of sense to me.
>
>     As for simplicity of form, orb__ came up with a version that uses
>     Either instead of Maybe to take advantage of the similarity of two
>     phases to get away with just one helper function, but I think that
>     approach is much harder to understand.
>
>     On Oct 10, 2014 10:48 AM, "John Lato" <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         I think David's proposed definition is the easiest to read form
>         with good performance so far.
>
>         As to definedness, the change in semantics doesn't matter to
>         me.  But I think calling isSuffixOf on possibly infinite lists
>         is a bad idea anyway.
>
>         John L.
>
>         On Oct 10, 2014 5:35 AM, "Milan Straka" <[hidden email]
>         <mailto:[hidden email]>> wrote:
>          >
>          > Hi all,
>          >
>          > > -----Original message-----
>          > > From: David Feuer <[hidden email]
>         <mailto:[hidden email]>>
>          > > Sent: 10 Oct 2014, 05:23
>          > >
>          > > On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn
>         <[hidden email] <mailto:[hidden email]>>
>          > > wrote:
>          > >
>          > > > That seems rather wordy to me. Can we get anywhere
>         starting with
>          > > >
>          > > >
>          > > It is rather wordy; I just don't know how to do better.
>          >
>          > What about something like the following? It uses exactly the same
>          > algorithm, it just avoids the maybe combinator and Maybe internal
>          > result, and renames stuff:
>          >
>          > isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
>          > [] `isSuffixOf` _ = True
>          > needle `isSuffixOf` hay = subtract_and_compare needle hay
>          >   where subtract_and_compare [] diff = skip_and_compare diff hay
>          >         subtract_and_compare _ [] = False -- needle longer
>         than hay
>          >         subtract_and_compare (n:ns) (h:hs) =
>         subtract_and_compare ns hs
>          >
>          >         skip_and_compare [] suffix = needle == suffix
>          >         skip_and_compare _ [] = True -- first argument must
>         also be []
>          >         skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss
>          >
>          > The redundant skip_and_compare case is there just for
>         completeness and
>          > readability.
>          >
>          > Cheers,
>          > Milan
>          > _______________________________________________
>          > Libraries mailing list
>          > [hidden email] <mailto:[hidden email]>
>          > http://www.haskell.org/mailman/listinfo/libraries
>
>
>         _______________________________________________
>         Libraries mailing list
>         [hidden email] <mailto:[hidden email]>
>         http://www.haskell.org/mailman/listinfo/libraries
>
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>


--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
It can be, but your way traverses the entire haystack *twice*. The memory needs are equivalent to the reversing version, which I consider unacceptable.

On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <[hidden email]> wrote:
David, the essence of your idea seems mutually drop the correct number of elements from needle and hay and then compare for equality.  Here is a concise implementation of your idea in terms of drop:

isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _  =  True
xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys

This can be further simplified to

isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _  =  True
xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys

which is a very easy to understand program, I think, without need to reverse lists.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Andreas Abel-2
Well, you also traverse the haystack twice, in your getEndChunk function
you simultaneously traverse the haystack and a (shared) copy of it.  Why
is this so much better?

I guess without data (timings, heap-profiles) we do not get further here.

On 11.10.2014 14:47, David Feuer wrote:
> It can be, but your way traverses the entire haystack *twice*. The
> memory needs are equivalent to the reversing version, which I consider
> unacceptable.

I do not understand this comment.  reverse needs O(n) memory, length O(1).

Cheers,
Andreas

> On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     David, the essence of your idea seems mutually drop the correct
>     number of elements from needle and hay and then compare for
>     equality.  Here is a concise implementation of your idea in terms of
>     drop:
>
>     isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>     [] `isSuffixOf` _  =  True
>     xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys
>
>     This can be further simplified to
>
>     isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>     [] `isSuffixOf` _  =  True
>     xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
>
>     which is a very easy to understand program, I think, without need to
>     reverse lists.
>
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>


--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
The haystack and the shared copy are only a needle's-length apart. The first stage traverses H and N until one of them runs out, holding a copy of the beginning  of each. This requires at most O(min{|H|, |N|}) additional memory (the original needle and the needle-length prefix of the haystack). Assuming we didn't run out of haystack before we ran out of needle (which assumption seems generally likely), we proceed to the next step, traversing H and (drop |N| H) together. During this phase we hold O(|N|) additional memory: the needle itself and the needle-length portion of the longer of the two haystack remnants we hold. Note that in the second phase, we *don't* hold on to the beginning of the haystack, because we will never need it again! Then in the final phase, when the shorter haystack remnant has run out, we still have the same O(|N|) memory, which is now the needle itself and the needle-length suffix of the haystack, and (==) traverses them both together. Thus the total additional memory residency for this algorithm is O(min {|H|, |N|}). Using the length approach requires that you hold the beginning of the haystack for traversal while running length. To put it another way, you force the entire spine of the haystack to run length, and then start from the beginning. If the haystack is produced lazily, this potentially requires O(|H|) extra memory. Since you also check the length of the needle, your total additional memory comes to O(max {|H|, |N|}). I apologize if my horrid variable names obscured what goes on in the algorithm I described.

On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <[hidden email]> wrote:
Well, you also traverse the haystack twice, in your getEndChunk function you simultaneously traverse the haystack and a (shared) copy of it.  Why is this so much better?

I guess without data (timings, heap-profiles) we do not get further here.

On 11.10.2014 14:47, David Feuer wrote:
It can be, but your way traverses the entire haystack *twice*. The
memory needs are equivalent to the reversing version, which I consider
unacceptable.

I do not understand this comment.  reverse needs O(n) memory, length O(1).

Cheers,
Andreas

On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <[hidden email]
<mailto:[hidden email]>> wrote:

    David, the essence of your idea seems mutually drop the correct
    number of elements from needle and hay and then compare for
    equality.  Here is a concise implementation of your idea in terms of
    drop:

    isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
    [] `isSuffixOf` _  =  True
    xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys

    This can be further simplified to

    isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
    [] `isSuffixOf` _  =  True
    xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys

    which is a very easy to understand program, I think, without need to
    reverse lists.



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries



--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Andreas Abel-2
If you talk about "additional memory" you assume that after `xs isSuffix
ys`, ys is no longer needed.  Is this the typical use case?
At least not in the Agda codebase, according to my quick grep...

./Packaging/Database.hs:
   dbEntries = filter (".conf" `isSuffixOf`) filePaths

./TypeChecking/Monad/Open.hs:
   unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context (" ++
show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"

./Interaction/Options.hs:
   isLiterate file = ".lagda" `isSuffixOf` file

./Syntax/Parser.hs:
   if "lagda" `isSuffixOf` filePath file then

As I said, optimizations should be backed by benchmarks, especially when
trading a perspicuous implementation for a more complicated one...

On 12.10.2014 17:40, David Feuer wrote:

> The haystack and the shared copy are only a needle's-length apart. The
> first stage traverses H and N until one of them runs out, holding a copy
> of the beginning  of each. This requires at most O(min{|H|, |N|})
> additional memory (the original needle and the needle-length prefix of
> the haystack). Assuming we didn't run out of haystack before we ran out
> of needle (which assumption seems generally likely), we proceed to the
> next step, traversing H and (drop |N| H) together. During this phase we
> hold O(|N|) additional memory: the needle itself and the needle-length
> portion of the longer of the two haystack remnants we hold. Note that in
> the second phase, we *don't* hold on to the beginning of the haystack,
> because we will never need it again! Then in the final phase, when the
> shorter haystack remnant has run out, we still have the same O(|N|)
> memory, which is now the needle itself and the needle-length suffix of
> the haystack, and (==) traverses them both together. Thus the total
> additional memory residency for this algorithm is O(min {|H|, |N|}).
> Using the length approach requires that you hold the beginning of the
> haystack for traversal while running length. To put it another way, you
> force the entire spine of the haystack to run length, and then start
> from the beginning. If the haystack is produced lazily, this potentially
> requires O(|H|) extra memory. Since you also check the length of the
> needle, your total additional memory comes to O(max {|H|, |N|}). I
> apologize if my horrid variable names obscured what goes on in the
> algorithm I described.
>
> On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Well, you also traverse the haystack twice, in your getEndChunk
>     function you simultaneously traverse the haystack and a (shared)
>     copy of it.  Why is this so much better?
>
>     I guess without data (timings, heap-profiles) we do not get further
>     here.
>
>     On 11.10.2014 14:47, David Feuer wrote:
>
>         It can be, but your way traverses the entire haystack *twice*. The
>         memory needs are equivalent to the reversing version, which I
>         consider
>         unacceptable.
>
>
>     I do not understand this comment.  reverse needs O(n) memory, length
>     O(1).
>
>     Cheers,
>     Andreas
>
>         On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <[hidden email]
>         <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>              David, the essence of your idea seems mutually drop the correct
>              number of elements from needle and hay and then compare for
>              equality.  Here is a concise implementation of your idea in
>         terms of
>              drop:
>
>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>              [] `isSuffixOf` _  =  True
>              xs `isSuffixOf` ys =  xs == drop (length (drop (length xs)
>         ys)) ys
>
>              This can be further simplified to
>
>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>              [] `isSuffixOf` _  =  True
>              xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
>
>              which is a very easy to understand program, I think,
>         without need to
>              reverse lists.
>
>
>
>         _________________________________________________
>         Libraries mailing list
>         [hidden email] <mailto:[hidden email]>
>         http://www.haskell.org/__mailman/listinfo/libraries
>         <http://www.haskell.org/mailman/listinfo/libraries>
>
>
>
>     --
>     Andreas Abel  <><      Du bist der geliebte Mensch.
>
>     Department of Computer Science and Engineering
>     Chalmers and Gothenburg University, Sweden
>
>     [hidden email] <mailto:[hidden email]>
>     http://www2.tcs.ifi.lmu.de/~__abel/ <http://www2.tcs.ifi.lmu.de/~abel/>
>
>
>
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>


--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
I've done the benchmarks and the results are clear: the implementation I gave is faster than the one you gave and the one in Data.List in all cases. Yours is usually faster than the one in Data.List, but slower for very short lists.

Test code:

needles =
  [ ("shortNeedle", ".exe")
   ,("longNeedle", "And who by fire? Who by water? Who in the sunshine? Who in the nighttime?")]

haystacks =
  [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
   ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131) .&. 0xFFFF) 7)]

impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel", isSuffixOfA)]

tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl` haystack)),
         ("use", (\impl (needle,haystack) ->
                     if needle `impl` haystack
                     then sum (map fromEnum haystack)
                     else product (map fromEnum haystack)))]

main = defaultMain
  [
  bgroup (unwords [fst test, fst needle, "in", fst haystack])
    [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd haystack)
       | impl <- impls] | test <- tests, needle <- needles, haystack <- haystacks]


Results:

benchmarking simple shortNeedle in shortHaystack/lib
time                 244.6 ns   (243.9 ns .. 245.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 245.2 ns   (244.8 ns .. 246.5 ns)
std dev              2.030 ns   (950.4 ps .. 4.378 ns)

benchmarking simple shortNeedle in shortHaystack/Feuer
time                 234.5 ns   (234.1 ns .. 235.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 234.6 ns   (234.2 ns .. 235.3 ns)
std dev              1.628 ns   (678.1 ps .. 2.758 ns)

benchmarking simple shortNeedle in shortHaystack/Abel
time                 268.1 ns   (267.8 ns .. 268.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 268.0 ns   (267.8 ns .. 268.4 ns)
std dev              947.8 ps   (538.1 ps .. 1.668 ns)


benchmarking simple shortNeedle in longHaystack/lib
time                 100.5 μs   (100.2 μs .. 100.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.8 μs   (100.5 μs .. 101.4 μs)
std dev              1.511 μs   (1.035 μs .. 2.095 μs)

benchmarking simple shortNeedle in longHaystack/Feuer
time                 55.66 μs   (55.61 μs .. 55.73 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 55.68 μs   (55.63 μs .. 55.79 μs)
std dev              222.0 ns   (128.7 ns .. 415.0 ns)

benchmarking simple shortNeedle in longHaystack/Abel
time                 94.60 μs   (94.39 μs .. 94.94 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.51 μs   (94.44 μs .. 94.68 μs)
std dev              332.8 ns   (183.6 ns .. 617.7 ns)


benchmarking simple longNeedle in shortHaystack/lib
time                 480.6 ns   (480.2 ns .. 481.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 480.6 ns   (480.1 ns .. 481.6 ns)
std dev              2.217 ns   (1.025 ns .. 4.032 ns)

benchmarking simple longNeedle in shortHaystack/Feuer
time                 187.0 ns   (186.8 ns .. 187.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 187.0 ns   (186.8 ns .. 187.4 ns)
std dev              922.0 ps   (427.3 ps .. 1.598 ns)

benchmarking simple longNeedle in shortHaystack/Abel
time                 340.9 ns   (339.4 ns .. 343.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 345.0 ns   (341.6 ns .. 351.0 ns)
std dev              14.40 ns   (9.622 ns .. 20.77 ns)
variance introduced by outliers: 60% (severely inflated)


benchmarking simple longNeedle in longHaystack/lib
time                 100.3 μs   (100.2 μs .. 100.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.4 μs   (100.3 μs .. 100.9 μs)
std dev              811.3 ns   (333.5 ns .. 1.575 μs)

benchmarking simple longNeedle in longHaystack/Feuer
time                 56.29 μs   (56.19 μs .. 56.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 56.23 μs   (56.19 μs .. 56.32 μs)
std dev              197.1 ns   (116.3 ns .. 342.1 ns)

benchmarking simple longNeedle in longHaystack/Abel
time                 94.46 μs   (94.32 μs .. 94.66 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.49 μs   (94.39 μs .. 94.70 μs)
std dev              459.8 ns   (277.2 ns .. 763.7 ns)


benchmarking use shortNeedle in shortHaystack/lib
time                 831.8 ns   (828.4 ns .. 836.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 831.2 ns   (829.3 ns .. 834.6 ns)
std dev              8.468 ns   (5.220 ns .. 13.26 ns)

benchmarking use shortNeedle in shortHaystack/Feuer
time                 819.4 ns   (816.7 ns .. 822.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 821.2 ns   (818.0 ns .. 828.2 ns)
std dev              15.89 ns   (7.988 ns .. 25.90 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking use shortNeedle in shortHaystack/Abel
time                 853.5 ns   (851.8 ns .. 855.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 852.6 ns   (851.5 ns .. 855.6 ns)
std dev              5.462 ns   (2.422 ns .. 10.51 ns)


benchmarking use shortNeedle in longHaystack/lib
time                 261.8 μs   (259.2 μs .. 264.7 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 260.2 μs   (259.5 μs .. 261.4 μs)
std dev              2.854 μs   (1.438 μs .. 4.475 μs)

benchmarking use shortNeedle in longHaystack/Feuer
time                 225.0 μs   (221.9 μs .. 227.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.0 μs   (220.2 μs .. 222.4 μs)
std dev              3.487 μs   (2.598 μs .. 4.385 μs)

benchmarking use shortNeedle in longHaystack/Abel
time                 244.5 μs   (232.5 μs .. 254.3 μs)
                     0.992 R²   (0.990 R² .. 0.997 R²)
mean                 232.1 μs   (229.4 μs .. 237.7 μs)
std dev              11.45 μs   (6.248 μs .. 16.89 μs)
variance introduced by outliers: 47% (moderately inflated)


benchmarking use longNeedle in shortHaystack/lib
time                 1.088 μs   (1.085 μs .. 1.091 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.087 μs   (1.086 μs .. 1.090 μs)
std dev              6.936 ns   (4.270 ns .. 9.691 ns)

benchmarking use longNeedle in shortHaystack/Feuer
time                 787.4 ns   (785.8 ns .. 789.9 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 786.7 ns   (785.9 ns .. 788.3 ns)
std dev              3.761 ns   (1.408 ns .. 6.358 ns)

benchmarking use longNeedle in shortHaystack/Abel
time                 930.7 ns   (927.7 ns .. 934.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 928.3 ns   (926.7 ns .. 931.3 ns)
std dev              7.241 ns   (3.785 ns .. 13.45 ns)


benchmarking use longNeedle in longHaystack/lib
time                 262.3 μs   (257.8 μs .. 266.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 258.6 μs   (257.7 μs .. 260.0 μs)
std dev              3.979 μs   (2.350 μs .. 5.888 μs)

benchmarking use longNeedle in longHaystack/Feuer
time                 224.9 μs   (222.1 μs .. 226.9 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.2 μs   (220.4 μs .. 222.3 μs)
std dev              3.168 μs   (2.143 μs .. 4.010 μs)

benchmarking use longNeedle in longHaystack/Abel
time                 233.1 μs   (231.4 μs .. 234.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 231.2 μs   (230.7 μs .. 232.0 μs)
std dev              2.099 μs   (1.419 μs .. 2.627 μs)

On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <[hidden email]> wrote:
If you talk about "additional memory" you assume that after `xs isSuffix ys`, ys is no longer needed.  Is this the typical use case?
At least not in the Agda codebase, according to my quick grep...

./Packaging/Database.hs:
  dbEntries = filter (".conf" `isSuffixOf`) filePaths

./TypeChecking/Monad/Open.hs:
  unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context (" ++ show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"

./Interaction/Options.hs:
  isLiterate file = ".lagda" `isSuffixOf` file

./Syntax/Parser.hs:
  if "lagda" `isSuffixOf` filePath file then

As I said, optimizations should be backed by benchmarks, especially when trading a perspicuous implementation for a more complicated one...


On 12.10.2014 17:40, David Feuer wrote:
The haystack and the shared copy are only a needle's-length apart. The
first stage traverses H and N until one of them runs out, holding a copy
of the beginning  of each. This requires at most O(min{|H|, |N|})
additional memory (the original needle and the needle-length prefix of
the haystack). Assuming we didn't run out of haystack before we ran out
of needle (which assumption seems generally likely), we proceed to the
next step, traversing H and (drop |N| H) together. During this phase we
hold O(|N|) additional memory: the needle itself and the needle-length
portion of the longer of the two haystack remnants we hold. Note that in
the second phase, we *don't* hold on to the beginning of the haystack,
because we will never need it again! Then in the final phase, when the
shorter haystack remnant has run out, we still have the same O(|N|)
memory, which is now the needle itself and the needle-length suffix of
the haystack, and (==) traverses them both together. Thus the total
additional memory residency for this algorithm is O(min {|H|, |N|}).
Using the length approach requires that you hold the beginning of the
haystack for traversal while running length. To put it another way, you
force the entire spine of the haystack to run length, and then start
from the beginning. If the haystack is produced lazily, this potentially
requires O(|H|) extra memory. Since you also check the length of the
needle, your total additional memory comes to O(max {|H|, |N|}). I
apologize if my horrid variable names obscured what goes on in the
algorithm I described.

On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <[hidden email]
<mailto:[hidden email]>> wrote:

    Well, you also traverse the haystack twice, in your getEndChunk
    function you simultaneously traverse the haystack and a (shared)
    copy of it.  Why is this so much better?

    I guess without data (timings, heap-profiles) we do not get further
    here.

    On 11.10.2014 14:47, David Feuer wrote:

        It can be, but your way traverses the entire haystack *twice*. The
        memory needs are equivalent to the reversing version, which I
        consider
        unacceptable.


    I do not understand this comment.  reverse needs O(n) memory, length
    O(1).

    Cheers,
    Andreas

        On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <[hidden email]
        <mailto:[hidden email]>
        <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

             David, the essence of your idea seems mutually drop the correct
             number of elements from needle and hay and then compare for
             equality.  Here is a concise implementation of your idea in
        terms of
             drop:

             isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
             [] `isSuffixOf` _  =  True
             xs `isSuffixOf` ys =  xs == drop (length (drop (length xs)
        ys)) ys

             This can be further simplified to

             isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
             [] `isSuffixOf` _  =  True
             xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys

             which is a very easy to understand program, I think,
        without need to
             reverse lists.



        _________________________________________________
        Libraries mailing list
        [hidden email] <mailto:[hidden email]>
        http://www.haskell.org/__mailman/listinfo/libraries
        <http://www.haskell.org/mailman/listinfo/libraries>



    --
    Andreas Abel  <><      Du bist der geliebte Mensch.

    Department of Computer Science and Engineering
    Chalmers and Gothenburg University, Sweden

    [hidden email] <mailto:[hidden email]>
    http://www2.tcs.ifi.lmu.de/~__abel/ <http://www2.tcs.ifi.lmu.de/~abel/>




_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries



--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
In reply to this post by Milan Straka
I did another set of benchmarks, and Milan's implementation has a small but clear advantage over mine. Therefore I amend my proposal to use Milan's implementation instead. I would suggest perhaps using the following names:

isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _ = True
needle `isSuffixOf` hay = dropNeedleFromHay needle hay
  where dropNeedleFromHay [] remainingHay = dropToHayEnd remainingHay hay
        dropNeedleFromHay _  []           = False -- needle longer than hay
        dropNeedleFromHay (n:ns) (h:hs)   = dropNeedleFromHay ns hs

        dropToHayEnd [] suffix = needle == suffix
        dropToHayEnd _ [] = True -- This can't happen, but returning True is measurably (slightly) faster than throwing an error here.
        dropToHayEnd (d:ds) (s:ss) = dropToHayEnd ds ss

On Fri, Oct 10, 2014 at 8:34 AM, Milan Straka <[hidden email]> wrote:
Hi all,

> -----Original message-----
> From: David Feuer <[hidden email]>
> Sent: 10 Oct 2014, 05:23
>
> On Fri, Oct 10, 2014 at 5:18 AM, Jon Fairbairn <[hidden email]>
> wrote:
>
> > That seems rather wordy to me. Can we get anywhere starting with
> >
> >
> It is rather wordy; I just don't know how to do better.

What about something like the following? It uses exactly the same
algorithm, it just avoids the maybe combinator and Maybe internal
result, and renames stuff:

isSuffixOf :: (Eq a) => [a] -> [a] -> Bool
[] `isSuffixOf` _ = True
needle `isSuffixOf` hay = subtract_and_compare needle hay
  where subtract_and_compare [] diff = skip_and_compare diff hay
        subtract_and_compare _ [] = False -- needle longer than hay
        subtract_and_compare (n:ns) (h:hs) = subtract_and_compare ns hs

        skip_and_compare [] suffix = needle == suffix
        skip_and_compare _ [] = True -- first argument must also be []
        skip_and_compare (d:ds) (s:ss) = skip_and_compare ds ss

The redundant skip_and_compare case is there just for completeness and
readability.

Cheers,
Milan
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Andreas Abel-2
In reply to this post by David Feuer
Nice, good work! --Andreas

On 13.10.2014 20:17, David Feuer wrote:

> I've done the benchmarks and the results are clear: the implementation I
> gave is faster than the one you gave and the one in Data.List in all
> cases. Yours is usually faster than the one in Data.List, but slower for
> very short lists.
>
> Test code:
>
> needles =
>    [ ("shortNeedle", ".exe")
>     ,("longNeedle", "And who by fire? Who by water? Who in the sunshine?
> Who in the nighttime?")]
>
> haystacks =
>    [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
>     ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131)
> .&. 0xFFFF) 7)]
>
> impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel",
> isSuffixOfA)]
>
> tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl`
> haystack)),
>           ("use", (\impl (needle,haystack) ->
>                       if needle `impl` haystack
>                       then sum (map fromEnum haystack)
>                       else product (map fromEnum haystack)))]
>
> main = defaultMain
>    [
>    bgroup (unwords [fst test, fst needle, "in", fst haystack])
>      [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd
> haystack)
>         | impl <- impls] | test <- tests, needle <- needles, haystack <-
> haystacks]
>
>
> Results:
>
> benchmarking simple shortNeedle in shortHaystack/lib
> time                 244.6 ns   (243.9 ns .. 245.6 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 245.2 ns   (244.8 ns .. 246.5 ns)
> std dev              2.030 ns   (950.4 ps .. 4.378 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Feuer
> time                 234.5 ns   (234.1 ns .. 235.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 234.6 ns   (234.2 ns .. 235.3 ns)
> std dev              1.628 ns   (678.1 ps .. 2.758 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Abel
> time                 268.1 ns   (267.8 ns .. 268.4 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 268.0 ns   (267.8 ns .. 268.4 ns)
> std dev              947.8 ps   (538.1 ps .. 1.668 ns)
>
>
> benchmarking simple shortNeedle in longHaystack/lib
> time                 100.5 μs   (100.2 μs .. 100.9 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 100.8 μs   (100.5 μs .. 101.4 μs)
> std dev              1.511 μs   (1.035 μs .. 2.095 μs)
>
> benchmarking simple shortNeedle in longHaystack/Feuer
> time                 55.66 μs   (55.61 μs .. 55.73 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 55.68 μs   (55.63 μs .. 55.79 μs)
> std dev              222.0 ns   (128.7 ns .. 415.0 ns)
>
> benchmarking simple shortNeedle in longHaystack/Abel
> time                 94.60 μs   (94.39 μs .. 94.94 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 94.51 μs   (94.44 μs .. 94.68 μs)
> std dev              332.8 ns   (183.6 ns .. 617.7 ns)
>
>
> benchmarking simple longNeedle in shortHaystack/lib
> time                 480.6 ns   (480.2 ns .. 481.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 480.6 ns   (480.1 ns .. 481.6 ns)
> std dev              2.217 ns   (1.025 ns .. 4.032 ns)
>
> benchmarking simple longNeedle in shortHaystack/Feuer
> time                 187.0 ns   (186.8 ns .. 187.2 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 187.0 ns   (186.8 ns .. 187.4 ns)
> std dev              922.0 ps   (427.3 ps .. 1.598 ns)
>
> benchmarking simple longNeedle in shortHaystack/Abel
> time                 340.9 ns   (339.4 ns .. 343.3 ns)
>                       1.000 R²   (0.999 R² .. 1.000 R²)
> mean                 345.0 ns   (341.6 ns .. 351.0 ns)
> std dev              14.40 ns   (9.622 ns .. 20.77 ns)
> variance introduced by outliers: 60% (severely inflated)
>
>
> benchmarking simple longNeedle in longHaystack/lib
> time                 100.3 μs   (100.2 μs .. 100.5 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 100.4 μs   (100.3 μs .. 100.9 μs)
> std dev              811.3 ns   (333.5 ns .. 1.575 μs)
>
> benchmarking simple longNeedle in longHaystack/Feuer
> time                 56.29 μs   (56.19 μs .. 56.41 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 56.23 μs   (56.19 μs .. 56.32 μs)
> std dev              197.1 ns   (116.3 ns .. 342.1 ns)
>
> benchmarking simple longNeedle in longHaystack/Abel
> time                 94.46 μs   (94.32 μs .. 94.66 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 94.49 μs   (94.39 μs .. 94.70 μs)
> std dev              459.8 ns   (277.2 ns .. 763.7 ns)
>
>
> benchmarking use shortNeedle in shortHaystack/lib
> time                 831.8 ns   (828.4 ns .. 836.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 831.2 ns   (829.3 ns .. 834.6 ns)
> std dev              8.468 ns   (5.220 ns .. 13.26 ns)
>
> benchmarking use shortNeedle in shortHaystack/Feuer
> time                 819.4 ns   (816.7 ns .. 822.7 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 821.2 ns   (818.0 ns .. 828.2 ns)
> std dev              15.89 ns   (7.988 ns .. 25.90 ns)
> variance introduced by outliers: 23% (moderately inflated)
>
> benchmarking use shortNeedle in shortHaystack/Abel
> time                 853.5 ns   (851.8 ns .. 855.4 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 852.6 ns   (851.5 ns .. 855.6 ns)
> std dev              5.462 ns   (2.422 ns .. 10.51 ns)
>
>
> benchmarking use shortNeedle in longHaystack/lib
> time                 261.8 μs   (259.2 μs .. 264.7 μs)
>                       1.000 R²   (0.999 R² .. 1.000 R²)
> mean                 260.2 μs   (259.5 μs .. 261.4 μs)
> std dev              2.854 μs   (1.438 μs .. 4.475 μs)
>
> benchmarking use shortNeedle in longHaystack/Feuer
> time                 225.0 μs   (221.9 μs .. 227.1 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 221.0 μs   (220.2 μs .. 222.4 μs)
> std dev              3.487 μs   (2.598 μs .. 4.385 μs)
>
> benchmarking use shortNeedle in longHaystack/Abel
> time                 244.5 μs   (232.5 μs .. 254.3 μs)
>                       0.992 R²   (0.990 R² .. 0.997 R²)
> mean                 232.1 μs   (229.4 μs .. 237.7 μs)
> std dev              11.45 μs   (6.248 μs .. 16.89 μs)
> variance introduced by outliers: 47% (moderately inflated)
>
>
> benchmarking use longNeedle in shortHaystack/lib
> time                 1.088 μs   (1.085 μs .. 1.091 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 1.087 μs   (1.086 μs .. 1.090 μs)
> std dev              6.936 ns   (4.270 ns .. 9.691 ns)
>
> benchmarking use longNeedle in shortHaystack/Feuer
> time                 787.4 ns   (785.8 ns .. 789.9 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 786.7 ns   (785.9 ns .. 788.3 ns)
> std dev              3.761 ns   (1.408 ns .. 6.358 ns)
>
> benchmarking use longNeedle in shortHaystack/Abel
> time                 930.7 ns   (927.7 ns .. 934.0 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 928.3 ns   (926.7 ns .. 931.3 ns)
> std dev              7.241 ns   (3.785 ns .. 13.45 ns)
>
>
> benchmarking use longNeedle in longHaystack/lib
> time                 262.3 μs   (257.8 μs .. 266.1 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 258.6 μs   (257.7 μs .. 260.0 μs)
> std dev              3.979 μs   (2.350 μs .. 5.888 μs)
>
> benchmarking use longNeedle in longHaystack/Feuer
> time                 224.9 μs   (222.1 μs .. 226.9 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 221.2 μs   (220.4 μs .. 222.3 μs)
> std dev              3.168 μs   (2.143 μs .. 4.010 μs)
>
> benchmarking use longNeedle in longHaystack/Abel
> time                 233.1 μs   (231.4 μs .. 234.5 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 231.2 μs   (230.7 μs .. 232.0 μs)
> std dev              2.099 μs   (1.419 μs .. 2.627 μs)
>
> On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     If you talk about "additional memory" you assume that after `xs
>     isSuffix ys`, ys is no longer needed.  Is this the typical use case?
>     At least not in the Agda codebase, according to my quick grep...
>
>     ./Packaging/Database.hs:
>        dbEntries = filter (".conf" `isSuffixOf`) filePaths
>
>     ./TypeChecking/Monad/Open.hs:
>        unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context ("
>     ++ show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"
>
>     ./Interaction/Options.hs:
>        isLiterate file = ".lagda" `isSuffixOf` file
>
>     ./Syntax/Parser.hs:
>        if "lagda" `isSuffixOf` filePath file then
>
>     As I said, optimizations should be backed by benchmarks, especially
>     when trading a perspicuous implementation for a more complicated one...
>
>
>     On 12.10.2014 17:40, David Feuer wrote:
>
>         The haystack and the shared copy are only a needle's-length
>         apart. The
>         first stage traverses H and N until one of them runs out,
>         holding a copy
>         of the beginning  of each. This requires at most O(min{|H|, |N|})
>         additional memory (the original needle and the needle-length
>         prefix of
>         the haystack). Assuming we didn't run out of haystack before we
>         ran out
>         of needle (which assumption seems generally likely), we proceed
>         to the
>         next step, traversing H and (drop |N| H) together. During this
>         phase we
>         hold O(|N|) additional memory: the needle itself and the
>         needle-length
>         portion of the longer of the two haystack remnants we hold. Note
>         that in
>         the second phase, we *don't* hold on to the beginning of the
>         haystack,
>         because we will never need it again! Then in the final phase,
>         when the
>         shorter haystack remnant has run out, we still have the same O(|N|)
>         memory, which is now the needle itself and the needle-length
>         suffix of
>         the haystack, and (==) traverses them both together. Thus the total
>         additional memory residency for this algorithm is O(min {|H|, |N|}).
>         Using the length approach requires that you hold the beginning
>         of the
>         haystack for traversal while running length. To put it another
>         way, you
>         force the entire spine of the haystack to run length, and then start
>         from the beginning. If the haystack is produced lazily, this
>         potentially
>         requires O(|H|) extra memory. Since you also check the length of the
>         needle, your total additional memory comes to O(max {|H|, |N|}). I
>         apologize if my horrid variable names obscured what goes on in the
>         algorithm I described.
>
>         On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel
>         <[hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>              Well, you also traverse the haystack twice, in your getEndChunk
>              function you simultaneously traverse the haystack and a
>         (shared)
>              copy of it.  Why is this so much better?
>
>              I guess without data (timings, heap-profiles) we do not get
>         further
>              here.
>
>              On 11.10.2014 14:47, David Feuer wrote:
>
>                  It can be, but your way traverses the entire haystack
>         *twice*. The
>                  memory needs are equivalent to the reversing version,
>         which I
>                  consider
>                  unacceptable.
>
>
>              I do not understand this comment.  reverse needs O(n)
>         memory, length
>              O(1).
>
>              Cheers,
>              Andreas
>
>                  On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel wrote:
>
>                       David, the essence of your idea seems mutually
>         drop the correct
>                       number of elements from needle and hay and then
>         compare for
>                       equality.  Here is a concise implementation of
>         your idea in
>                  terms of
>                       drop:
>
>                       isSuffixOf        :: forall a . (Eq a) => [a] ->
>         [a] -> Bool
>                       [] `isSuffixOf` _  =  True
>                       xs `isSuffixOf` ys =  xs == drop (length (drop
>         (length xs)
>                  ys)) ys
>
>                       This can be further simplified to
>
>                       isSuffixOf        :: forall a . (Eq a) => [a] ->
>         [a] -> Bool
>                       [] `isSuffixOf` _  =  True
>                       xs `isSuffixOf` ys =  xs == drop (length ys -
>         length xs) ys
>
>                       which is a very easy to understand program, I think,
>                  without need to
>                       reverse lists.
>


--
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

[hidden email]
http://www2.tcs.ifi.lmu.de/~abel/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Kim-Ee Yeoh
Administrator
In reply to this post by David Feuer

On Tue, Oct 14, 2014 at 1:17 AM, David Feuer <[hidden email]> wrote:
I've done the benchmarks and the results are clear: the implementation I gave is faster than the one you gave and the one in Data.List in all cases. Yours is usually faster than the one in Data.List, but slower for very short lists.

The 2x factor you're seeing over Andreas's diminishes once we put slightly more effort into an apples-to-apples comparison.

1. Instead of drop (length xs) ys, let's define the equivalent

dropl :: [b] -> [a] -> [a]
dropl (_:xs) (_:ys) = dropl xs ys
dropl [] ys = ys
dropl xs [] = []

i.e. dropl xs ys ~ drop (length xs) ys.

Now with Andreas's original version:

xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys

that 2x gap narrows down to 10% for most cases.

2. There's that fast path which you optimize for, where the needle is patently too long to be in the haystack. To narrow the semantic gap, we can write

dropm :: [b] -> [a] -> Maybe [a]
dropm (_:xs) (_:ys) = dropm xs ys
dropm [] ys = Just ys
dropm _ []  = Nothing

xs `isSuffixOfA` ys    =  maybe False id $ do
   zs <- dropm xs ys
   ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
   return $ xs == ws

Now, the long-needle-short-haystack case also becomes merely 10% off.

I'm -1 on both your proposal and the no.2 code because it's too much verbosity for uncertain gain. The benchmarks look good, but is the function even the bottleneck? For folks that care deeply about speed, lists is almost always the wrong datatype anyway.

I'm weakly +1 on no.1 because it beats the current double reverse definition!

-- Kim-Ee

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

David Feuer
I've switched my allegiance from my own version to Milan's, because it's a tad faster, and also less verbose. One thing to watch out for: although I don't particularly like this, the current version in Data.List makes  [] `isSuffixOf` undefined = True.  Unless there's a general consensus that this doesn't matter, we need to preserve it.

I don't think Milan's version is too terribly verbose. I tested something very similar to your #1 that was proposed on IRC by Reid Barton, and the numbers just didn't look too wonderful. I don't think Milan's version is too terribly verbose, and I think it's clearer than your #2. As for the depth of caring about speed, I generally disagree with you: lists are actually very good for some purposes, and, moreover, even when they're *not*, people use them anyway and then other people wait for that code to run.

On Mon, Oct 13, 2014 at 6:10 PM, Kim-Ee Yeoh <[hidden email]> wrote:

On Tue, Oct 14, 2014 at 1:17 AM, David Feuer <[hidden email]> wrote:
I've done the benchmarks and the results are clear: the implementation I gave is faster than the one you gave and the one in Data.List in all cases. Yours is usually faster than the one in Data.List, but slower for very short lists.

The 2x factor you're seeing over Andreas's diminishes once we put slightly more effort into an apples-to-apples comparison.

1. Instead of drop (length xs) ys, let's define the equivalent

dropl :: [b] -> [a] -> [a]
dropl (_:xs) (_:ys) = dropl xs ys
dropl [] ys = ys
dropl xs [] = []

i.e. dropl xs ys ~ drop (length xs) ys.

Now with Andreas's original version:

xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys

that 2x gap narrows down to 10% for most cases.

2. There's that fast path which you optimize for, where the needle is patently too long to be in the haystack. To narrow the semantic gap, we can write

dropm :: [b] -> [a] -> Maybe [a]
dropm (_:xs) (_:ys) = dropm xs ys
dropm [] ys = Just ys
dropm _ []  = Nothing

xs `isSuffixOfA` ys    =  maybe False id $ do
   zs <- dropm xs ys
   ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
   return $ xs == ws

Now, the long-needle-short-haystack case also becomes merely 10% off.

I'm -1 on both your proposal and the no.2 code because it's too much verbosity for uncertain gain. The benchmarks look good, but is the function even the bottleneck? For folks that care deeply about speed, lists is almost always the wrong datatype anyway.

I'm weakly +1 on no.1 because it beats the current double reverse definition!

-- Kim-Ee


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Kim-Ee Yeoh
Administrator
David,

To paraphrase Kernighan, ascertaining correctness is twice as hard as writing a program in the first place. So if you're as clever as possible with your code, how will you know it's even correct?

Case in point: the infinite lists wrinkle that you started the thread with. /At a glance/ what do the general recursion versions (yours and Milan's) evaluate to on infinite lists? What are the benchmarks on the time it takes for a haskell programmer to figure them out? Would they not jump at the greater affordance of reasoning with Andreas's compositions?

Meaning and reasonableness have always been the hallmarks of core libs.

Least of all, since it's so losering to benchmark the wrong thing, see below on how Andreas-dropl no.1 beats out David-Milan by the tiniest sliver in short-needle-long-haystack. Like checking file extensions in pathnames that we saw in Andreas's Agda code. Here a concrete app brings real meaning to the timings as opposed to optimizing for vaporware. Watch them Agdaists fall off their chairs when they hit the speed boost.

Don't mean to preach to the choir. Only layin' down the facts for the sake of the archive that every boiled haskeller knows to switch datatypes if isSuffixOf becomes the bottleneck.

benchmarking simple shortNeedle in shortHaystack/lib
time                 176.6 ns   (176.4 ns .. 177.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 176.6 ns   (176.5 ns .. 176.9 ns)
std dev              596.3 ps   (318.4 ps .. 1.096 ns)

benchmarking simple shortNeedle in shortHaystack/Milan
time                 135.6 ns   (135.6 ns .. 135.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 135.6 ns   (135.6 ns .. 135.6 ns)
std dev              45.49 ps   (23.88 ps .. 79.14 ps)

benchmarking simple shortNeedle in shortHaystack/Abel
time                 145.6 ns   (142.5 ns .. 149.5 ns)
                     0.996 R²   (0.996 R² .. 0.997 R²)
mean                 147.3 ns   (144.8 ns .. 150.0 ns)
std dev              8.616 ns   (8.070 ns .. 9.037 ns)
variance introduced by outliers: 76% (severely inflated)

benchmarking simple shortNeedle in longHaystack/lib
time                 71.12 μs   (71.08 μs .. 71.15 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.12 μs   (71.08 μs .. 71.16 μs)
std dev              142.7 ns   (115.6 ns .. 186.3 ns)

benchmarking simple shortNeedle in longHaystack/Milan
time                 45.00 μs   (44.99 μs .. 45.00 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 45.00 μs   (45.00 μs .. 45.01 μs)
std dev              21.04 ns   (15.62 ns .. 28.75 ns)

benchmarking simple shortNeedle in longHaystack/Abel
time                 43.68 μs   (43.68 μs .. 43.70 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 43.69 μs   (43.68 μs .. 43.69 μs)
std dev              18.29 ns   (12.79 ns .. 27.76 ns)

benchmarking simple longNeedle in shortHaystack/lib
time                 396.3 ns   (396.2 ns .. 396.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 396.6 ns   (396.3 ns .. 397.0 ns)
std dev              1.106 ns   (707.9 ps .. 1.662 ns)

benchmarking simple longNeedle in shortHaystack/Milan
time                 117.9 ns   (117.9 ns .. 118.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 117.9 ns   (117.9 ns .. 117.9 ns)
std dev              49.08 ps   (30.66 ps .. 75.20 ps)

benchmarking simple longNeedle in shortHaystack/Abel
time                 125.6 ns   (125.6 ns .. 125.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 125.6 ns   (125.6 ns .. 125.6 ns)
std dev              35.17 ps   (19.08 ps .. 58.78 ps)

benchmarking simple longNeedle in longHaystack/lib
time                 71.98 μs   (71.92 μs .. 72.03 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.93 μs   (71.85 μs .. 72.00 μs)
std dev              247.4 ns   (199.9 ns .. 305.3 ns)

benchmarking simple longNeedle in longHaystack/Milan
time                 47.26 μs   (47.24 μs .. 47.29 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 47.28 μs   (47.27 μs .. 47.29 μs)
std dev              35.43 ns   (28.70 ns .. 45.80 ns)

benchmarking simple longNeedle in longHaystack/Abel
time                 47.44 μs   (47.43 μs .. 47.45 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 47.44 μs   (47.44 μs .. 47.45 μs)
std dev              16.81 ns   (10.63 ns .. 28.75 ns)

benchmarking use shortNeedle in shortHaystack/lib
time                 617.9 ns   (616.8 ns .. 618.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 617.9 ns   (616.9 ns .. 618.4 ns)
std dev              2.295 ns   (977.3 ps .. 3.747 ns)

benchmarking use shortNeedle in shortHaystack/Milan
time                 570.7 ns   (570.6 ns .. 570.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 570.7 ns   (570.6 ns .. 570.7 ns)
std dev              194.8 ps   (154.1 ps .. 262.9 ps)

benchmarking use shortNeedle in shortHaystack/Abel
time                 576.8 ns   (575.5 ns .. 579.5 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 576.5 ns   (575.7 ns .. 578.9 ns)
std dev              5.133 ns   (149.4 ps .. 9.882 ns)

benchmarking use shortNeedle in longHaystack/lib
time                 194.4 μs   (192.0 μs .. 195.9 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 190.9 μs   (190.2 μs .. 191.9 μs)
std dev              2.789 μs   (1.938 μs .. 3.452 μs)

benchmarking use shortNeedle in longHaystack/Milan
time                 169.3 μs   (164.6 μs .. 171.8 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 160.1 μs   (158.1 μs .. 162.6 μs)
std dev              7.419 μs   (5.720 μs .. 8.512 μs)
variance introduced by outliers: 46% (moderately inflated)

benchmarking use shortNeedle in longHaystack/Abel
time                 166.4 μs   (162.9 μs .. 168.4 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 159.1 μs   (157.5 μs .. 161.0 μs)
std dev              5.903 μs   (4.471 μs .. 6.721 μs)
variance introduced by outliers: 35% (moderately inflated)

benchmarking use longNeedle in shortHaystack/lib
time                 828.0 ns   (827.8 ns .. 828.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 828.0 ns   (827.9 ns .. 828.2 ns)
std dev              582.1 ps   (339.9 ps .. 970.5 ps)

benchmarking use longNeedle in shortHaystack/Milan
time                 550.0 ns   (550.0 ns .. 550.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 550.0 ns   (550.0 ns .. 550.1 ns)
std dev              223.1 ps   (97.52 ps .. 438.8 ps)

benchmarking use longNeedle in shortHaystack/Abel
time                 562.1 ns   (562.1 ns .. 562.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 562.1 ns   (562.1 ns .. 562.2 ns)
std dev              109.5 ps   (74.48 ps .. 182.3 ps)

benchmarking use longNeedle in longHaystack/lib
time                 195.7 μs   (193.3 μs .. 197.3 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 191.9 μs   (191.1 μs .. 193.0 μs)
std dev              3.065 μs   (2.169 μs .. 3.825 μs)

benchmarking use longNeedle in longHaystack/Milan
time                 170.6 μs   (165.7 μs .. 173.4 μs)
                     0.996 R²   (0.995 R² .. 0.998 R²)
mean                 160.8 μs   (158.6 μs .. 163.5 μs)
std dev              7.928 μs   (6.085 μs .. 9.063 μs)
variance introduced by outliers: 49% (moderately inflated)

benchmarking use longNeedle in longHaystack/Abel
time                 170.4 μs   (165.4 μs .. 173.2 μs)
                     0.996 R²   (0.995 R² .. 0.997 R²)
mean                 160.5 μs   (158.3 μs .. 163.3 μs)
std dev              8.066 μs   (6.200 μs .. 9.280 μs)
variance introduced by outliers: 50% (moderately inflated)

This is 64-bit on a recent i5.

I appreciate learning something new from this discussion, so to pay it forward please find attached a self-contained lhs for your nano-tweaking pleasure.


-- Kim-Ee

On Tue, Oct 14, 2014 at 6:02 AM, David Feuer <[hidden email]> wrote:
I've switched my allegiance from my own version to Milan's, because it's a tad faster, and also less verbose. One thing to watch out for: although I don't particularly like this, the current version in Data.List makes  [] `isSuffixOf` undefined = True.  Unless there's a general consensus that this doesn't matter, we need to preserve it.

I don't think Milan's version is too terribly verbose. I tested something very similar to your #1 that was proposed on IRC by Reid Barton, and the numbers just didn't look too wonderful. I don't think Milan's version is too terribly verbose, and I think it's clearer than your #2. As for the depth of caring about speed, I generally disagree with you: lists are actually very good for some purposes, and, moreover, even when they're *not*, people use them anyway and then other people wait for that code to run.

On Mon, Oct 13, 2014 at 6:10 PM, Kim-Ee Yeoh <[hidden email]> wrote:

On Tue, Oct 14, 2014 at 1:17 AM, David Feuer <[hidden email]> wrote:
I've done the benchmarks and the results are clear: the implementation I gave is faster than the one you gave and the one in Data.List in all cases. Yours is usually faster than the one in Data.List, but slower for very short lists.

The 2x factor you're seeing over Andreas's diminishes once we put slightly more effort into an apples-to-apples comparison.

1. Instead of drop (length xs) ys, let's define the equivalent

dropl :: [b] -> [a] -> [a]
dropl (_:xs) (_:ys) = dropl xs ys
dropl [] ys = ys
dropl xs [] = []

i.e. dropl xs ys ~ drop (length xs) ys.

Now with Andreas's original version:

xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys

that 2x gap narrows down to 10% for most cases.

2. There's that fast path which you optimize for, where the needle is patently too long to be in the haystack. To narrow the semantic gap, we can write

dropm :: [b] -> [a] -> Maybe [a]
dropm (_:xs) (_:ys) = dropm xs ys
dropm [] ys = Just ys
dropm _ []  = Nothing

xs `isSuffixOfA` ys    =  maybe False id $ do
   zs <- dropm xs ys
   ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
   return $ xs == ws

Now, the long-needle-short-haystack case also becomes merely 10% off.

I'm -1 on both your proposal and the no.2 code because it's too much verbosity for uncertain gain. The benchmarks look good, but is the function even the bottleneck? For folks that care deeply about speed, lists is almost always the wrong datatype anyway.

I'm weakly +1 on no.1 because it beats the current double reverse definition!

-- Kim-Ee



_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries

T_isSuffixOf.lhs (2K) Download Attachment
12