Substring replacements

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Substring replacements

Daniel Fischer-4
Okay, I have looked up KMP and implemented it.
Seems to work -- my first use of QuickCheck, too.
It's slower than Bulat's and Tomasz' for Branimir's test :-(,
but really fast for my test.
Undoubtedly, one can still tune it.
Here's the code:

module KMP where

import Data.Array

searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
    where
      len = length src
      pat = listArray (0,len-1) src
      bord = array (0,len) $
        (0,-1):(1,0):[(i+1,boun i (bord!i)) | i <- [1 .. len-1]]
      boun i j
         | j < 0          = 0
         | pat!i == pat!j = j+1
         | otherwise      = boun i (bord!j)
      getBord s n
         | m < 1      = m
         | s == pat!m = m
         | otherwise  = getBord s m
           where
             m = bord!n
      process n str _ | n >= len = dst ++ process 0 str ""
      process _ "" mat = reverse mat
      process 0 (s:st) _
         | s == c    = process 1 st [s]
         | otherwise = s:process 0 st ""
      process n str@(s:st) mat
         | s == pat!n = process (n+1) st (s:mat)
         | otherwise  = let j = getBord s n
                            (ret,skip) = splitAt j mat
                        in if j < 0 then reverse mat ++ process 0 str ""
                           else reverse skip ++ process j str ret

Cheers,
Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Sebastian Sylvan
On 12/12/05, Daniel Fischer <[hidden email]> wrote:
> Okay, I have looked up KMP and implemented it.
> Seems to work -- my first use of QuickCheck, too.
> It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> but really fast for my test.
> Undoubtedly, one can still tune it.

Perhaps by using unboxed arrays...

/S


--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: [hidden email]
>Subject: [Haskell-cafe] Substring replacements
>Date: Mon, 12 Dec 2005 01:14:37 +0100
>
>Okay, I have looked up KMP and implemented it.
>Seems to work -- my first use of QuickCheck, too.
>It's slower than Bulat's and Tomasz' for Branimir's test :-(,
>but really fast for my test.

Strange I got completelly different results:

maxa@MAXA ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real    0m16.407s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ ghc -fglasgow-exts  -O2 srchrep.hs --make -o srchrep.exe
Chasing modules from: srchrep.hs
Compiling Main             ( srchrep.hs, srchrep.o )
Linking ...

bmaxa@MAXA ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m10.156s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m11.672s
user    0m0.015s
sys     0m0.015s

Now your version is fastest according to my machine, but it is not faster
with your test it's slower in compariton to replace1.

I've corrected my code so it is fastest with your test,still less then a
second,
but slowest with mine.
Checked with your fail tests and compared results of these 2 tests.
Now should be ok.
I maintan now two lists one for successes and other for failures.
I also prettified code a bit .

searchReplace :: String->String->String -> String
searchReplace sr rp xs = searchr sr rp xs "" ""
   where
    searchr :: String->String->String->String->String -> String
    searchr [] _ xs _ _ =  xs
    searchr _ _ [] _ _  = []
    searchr sr rp xs retBack rollBack
                 | isFound $ fnd rollBack = rp
                                      ++ searchr sr rp (remaining $ fnd
rollBack )
                                                       ( getRetBack $ fnd
rollBack)
                                                       ( getRollBack $ fnd
rollBack)
                 | otherwise = reverse ((processed $ fnd rollBack) ++
rollBack)
                               ++ searchr sr rp (remaining $ fnd rollBack)
                                                ( getRetBack $ fnd rollBack)
                                                ( getRollBack $ fnd
rollBack)
                where fnd  = searchr' sr sr (reverse retBack ++ xs) ""

    isFound = fst . fst
    remaining = snd . snd . fst
    getRollBack = snd . snd
    getRetBack = fst . snd
    processed = fst . snd . fst

    searchr' :: String->String->String->String->String
                -> ((Bool,(String,String)),(String,String))
    searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
                           searchr'' (drop (length rollBack) srch) srch' xs
fndSoFar
                                     (False,"","") sr

    searchr'' :: String->String->String->String->(Bool,String,String)->Char
             -> ((Bool,(String,String)),(String,String))
    searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),("",""))
    searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++
rollBack ++ fnd,[])),("",""))
    searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar
(cnt,retBack,rollBack) s
      | sr == x = if cnt && sr' == x && isEmpty retBack
                     then searchr'' srs srs' xs fndSoFar
(True,"",x:rollBack) s
                     else if not (isEmpty retBack)  || not (isEmpty
rollBack)
                             then searchr'' srs srch' xs fndSoFar
                                             (True,(x:rollBack) ++
retBack,"") s
                             else searchr'' srs srch' xs (x:fndSoFar)
(True,"","") s
      | otherwise = if isEmpty rollBack && isEmpty retBack
                       then if s == x
                               then ((False,(fndSoFar,xxs)),("",""))
                               else ((False,searchr''' s xs
(x:fndSoFar)),("",""))
                       else if sr' == x && isEmpty retBack
                               then ((False,(fndSoFar, xs)),
(retBack,x:rollBack))
                               else ((False,(fndSoFar, xxs)),
(retBack,rollBack))


    searchr''' :: Char->String->String -> (String,String)
    searchr''' sr [] fndSoFar = (fndSoFar,[])
    searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs
(x:fndSoFar)
                             | otherwise = (fndSoFar,xxs)
    isEmpty [] = True
    isEmpty (a:as) = False
-------------------------------------------------------------------------------



Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Sebastian Sylvan
Am Montag, 12. Dezember 2005 01:34 schrieben Sie:

> On 12/12/05, Daniel Fischer <[hidden email]> wrote:
> > Okay, I have looked up KMP and implemented it.
> > Seems to work -- my first use of QuickCheck, too.
> > It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> > but really fast for my test.
> > Undoubtedly, one can still tune it.
>
> Perhaps by using unboxed arrays...
>
> /S
>
>
> --
> Sebastian Sylvan
> +46(0)736-818655
> UIN: 44640862

I'm afraid, unboxed arrays are out of the question, because bord is
incrementally produced :-(


Working very long
test2: <<loop>>


Cheers,
Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Branimir Maksimovic-2
Sorry, but
Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
"abababaaba"

I haven't analyzed the algorithm, so I don't know why exactly this fails.
I'll take a look sometime soon.

As for times: a complete stat is attached, all compiled with -O2, as well as
the modified KMP-version and my transcript of Branimir's new version.
On my computer, KMP smashes everything else on my test (except Branimir's,
only that doesn't yet work correctly), while Bulat's definitely is faster
than anything for Branimir's test and faster than anything but KMP for mine.

Branimir, isEmpty is the Prelude function 'null', so you needn't define it
yourself.

Cheers,
Daniel

Am Montag, 12. Dezember 2005 05:20 schrieben Sie:

> From: Daniel Fischer <[hidden email]>
>
> >To: [hidden email]
> >Subject: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 01:14:37 +0100
> >
> >Okay, I have looked up KMP and implemented it.
> >Seems to work -- my first use of QuickCheck, too.
> >It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> >but really fast for my test.
>
> Strange I got completelly different results:
>
> maxa@MAXA ~/tutorial
> $ time srchrep.exe
> Working very long
> True
> Done
>
> real    0m16.407s
> user    0m0.015s
> sys     0m0.015s
>
> bmaxa@MAXA ~/tutorial
> $ ghc -fglasgow-exts  -O2 srchrep.hs --make -o srchrep.exe
> Chasing modules from: srchrep.hs
> Compiling Main             ( srchrep.hs, srchrep.o )
> Linking ...
>
> bmaxa@MAXA ~/tutorial
> $ time srchrep.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m10.156s
> user    0m0.015s
> sys     0m0.015s
>
> bmaxa@MAXA ~/tutorial
> $ time replace1.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m11.672s
> user    0m0.015s
> sys     0m0.015s
>
> Now your version is fastest according to my machine, but it is not faster
> with your test it's slower in compariton to replace1.
>
> I've corrected my code so it is fastest with your test,still less then a
> second,
> but slowest with mine.
> Checked with your fail tests and compared results of these 2 tests.
> Now should be ok.
> I maintan now two lists one for successes and other for failures.
> I also prettified code a bit .
>
> searchReplace :: String->String->String -> String
> searchReplace sr rp xs = searchr sr rp xs "" ""
>    where
>     searchr :: String->String->String->String->String -> String
>     searchr [] _ xs _ _ =  xs
>     searchr _ _ [] _ _  = []
>     searchr sr rp xs retBack rollBack
>
>                  | isFound $ fnd rollBack = rp
>
>                                       ++ searchr sr rp (remaining $ fnd
> rollBack )
>                                                        ( getRetBack $ fnd
> rollBack)
>                                                        ( getRollBack $ fnd
> rollBack)
>
>                  | otherwise = reverse ((processed $ fnd rollBack) ++
>
> rollBack)
>                                ++ searchr sr rp (remaining $ fnd rollBack)
>                                                 ( getRetBack $ fnd
> rollBack) ( getRollBack $ fnd rollBack)
>                 where fnd  = searchr' sr sr (reverse retBack ++ xs) ""
>
>     isFound = fst . fst
>     remaining = snd . snd . fst
>     getRollBack = snd . snd
>     getRetBack = fst . snd
>     processed = fst . snd . fst
>
>     searchr' :: String->String->String->String->String
>                 -> ((Bool,(String,String)),(String,String))
>     searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
>                            searchr'' (drop (length rollBack) srch) srch' xs
> fndSoFar
>                                      (False,"","") sr
>
>     searchr'' :: String->String->String->String->(Bool,String,String)->Char
>              -> ((Bool,(String,String)),(String,String))
>     searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),("",""))
>     searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++
> rollBack ++ fnd,[])),("",""))
>     searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar
> (cnt,retBack,rollBack) s
>
>       | sr == x = if cnt && sr' == x && isEmpty retBack
>
>                      then searchr'' srs srs' xs fndSoFar
> (True,"",x:rollBack) s
>                      else if not (isEmpty retBack)  || not (isEmpty
> rollBack)
>                              then searchr'' srs srch' xs fndSoFar
>                                              (True,(x:rollBack) ++
> retBack,"") s
>                              else searchr'' srs srch' xs (x:fndSoFar)
> (True,"","") s
>
>       | otherwise = if isEmpty rollBack && isEmpty retBack
>
>                        then if s == x
>                                then ((False,(fndSoFar,xxs)),("",""))
>                                else ((False,searchr''' s xs
> (x:fndSoFar)),("",""))
>                        else if sr' == x && isEmpty retBack
>                                then ((False,(fndSoFar, xs)),
> (retBack,x:rollBack))
>                                else ((False,(fndSoFar, xxs)),
> (retBack,rollBack))
>
>
>     searchr''' :: Char->String->String -> (String,String)
>     searchr''' sr [] fndSoFar = (fndSoFar,[])
>     searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs
> (x:fndSoFar)
>
>                              | otherwise = (fndSoFar,xxs)
>
>     isEmpty [] = True
>     isEmpty (a:as) = False
> ---------------------------------------------------------------------------
>----
>
>
>
> Greetings, Bane.
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's FREE!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Files.tar.gz (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: [hidden email]
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Mon, 12 Dec 2005 10:31:49 +0100
>
>Am Montag, 12. Dezember 2005 01:34 schrieben Sie:
> > On 12/12/05, Daniel Fischer <[hidden email]> wrote:
> > > Okay, I have looked up KMP and implemented it.
> > > Seems to work -- my first use of QuickCheck, too.
> > > It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> > > but really fast for my test.
> > > Undoubtedly, one can still tune it.
> >
> > Perhaps by using unboxed arrays...
> >
> > /S
> >
> >
> > --
> > Sebastian Sylvan
> > +46(0)736-818655
> > UIN: 44640862
>
>I'm afraid, unboxed arrays are out of the question, because bord is
>incrementally produced :-(
>
>
>Working very long
>test2: <<loop>>
>

No worrie your test is now fastest with both your and mine test.
I;ve forgot to change working function in your test:0)

mine test: your program is  is srchrep.exe
bmaxa@MAXA ~/tutorial
$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m14.344s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m10.672s > your program is almost 1.5 secs faster then Bulat's
user    0m0.015s
sys     0m0.000s

bmaxa@MAXA ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m12.016s
user    0m0.015s
sys     0m0.015s


now your test:

bmaxa@MAXA ~/tutorial
$ time searchr.exe
Working very long
True
Done

real    0m0.312s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ time replace1.exe
Working very long
False
Done

real    0m12.516s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real    0m0.375s > yours is less then second as mine but is fastest in both
tests
user    0m0.015s
sys     0m0.015s

I don;t know how you get lesser numbers with mine test, but on
this machine your KMP algorithm performs best.


Greetings ,Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Daniel Fischer-4
Earlier today:
> Sorry, but
> Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> "abababaaba"
>
> I haven't analyzed the algorithm, so I don't know why exactly this fails.
> I'll take a look sometime soon.
>

I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently long
match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you start
pushing onto the rollback-stack. But that isn't inspected anymore, so if the
actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss it.

let src = concat (replicate n "abc") ++ "d"
let str = concat (replicate (n+k) "abc") ++ "d"
then
searchReplace src "Success!" str
will work correctly iff k is congruent to 0 or 1 modulo (n+1).

Now to fix it, I see two possibilities
1. re-inspect the rollback, which brings you basically to my srchrep
2. introduce a hierarchy of rollback-stacks - but that would be rather
horrible, I think, because you must keep count on which stack you have to
push, how many matching patterns you currently have (and which ones they are)
...

So the question is, can we find a cheap test to decide whether to use KMP or
Bulat's version?

Cheers,
Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Mon, 12 Dec 2005 13:07:29 +0100
>
>Sorry, but
>Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
>"abababaaba"
>
>I haven't analyzed the algorithm, so I don't know why exactly this fails.
>I'll take a look sometime soon.

It failed because I didn;t adjusted search string for rollBack when previous
rollBack is not null.
this is corrected version: (with your changes it looks much better)
-------------------------------------------------------------------------------
searchReplace :: String->String->String -> String
searchReplace "" _ xs  = xs
searchReplace sr rp xs = searchr sr rp xs "" ""
   where
     searchr :: String->String->String->String->String -> String
     searchr _ _ "" _ _ = ""
     searchr sr rp xs retB rollB
        | found     = rp ++ searchr sr rp rema ret roll
        | otherwise = reverse (proc ++ rollB) ++
                      searchr sr rp rema ret roll
           where
             (found, proc, rema, ret, roll)
                = searchr' sr sr (reverse retB ++ xs) "" rollB

     searchr' src@(s:sr) src'@(s':sr') xs soFar rollB
        = searchr'' (drop (length rollB) src) src' xs soFar (not (null
rollB),"","") s

     searchr'' "" _ xs fnd _ _ = (True,fnd,xs,"","")
     searchr'' _ _ "" fnd (_,ret,roll) _ = (False,ret++roll++fnd,"","","")
     searchr'' src@(s:sr) src'@(s':sr') xxs@(x:xs) soFar (cnt,ret,roll) c
        | s == x = if s' == x && null ret && cnt
                      then searchr'' sr sr' xs soFar (True, "", x:roll) c
                      else
                        if null ret && null roll
                           then searchr'' sr src' xs (x:soFar) (True, "", "") c
                           else searchr'' sr src' xs soFar (True, x:roll++ret, "") c
        | otherwise = if null roll && null ret
                         then
                            if c == x
                              then (False, soFar, xxs, "", "")
                              else let (from, pre) = break (==c) xs
                                   in (False, reverse from ++ x:soFar, pre, "", "")
                          else
                            if s'/=x
                  then if null ret
                       then (False, (x:roll) ++ soFar, xs,"","")
                       else (False, soFar, xxs,ret,"")
                  else if null ret
                                   then (False, soFar, xs, "", x:roll)
                                   else (False, soFar, xxs, ret, "")
----------------------------------------------------------------------------

However it is significantly slower then previous ugly version:

searchReplace :: String->String->String -> String
searchReplace sr rp xs = searchr sr rp xs "" ""
   where
    searchr :: String->String->String->String->String -> String
    searchr [] _ xs _ _ =  xs
    searchr _ _ [] _ _  = []
    searchr sr rp xs retBack rollBack
                 | isFound $ fnd rollBack = rp
                                      ++ searchr sr rp (remaining $ fnd
rollBack )
                                                       ( getRetBack $ fnd
rollBack)
                                                       ( getRollBack $ fnd
rollBack)
                 | otherwise = reverse ((processed $ fnd rollBack) ++
rollBack)
                               ++ searchr sr rp (remaining $ fnd rollBack)
                                                ( getRetBack $ fnd rollBack)
                                                ( getRollBack $ fnd
rollBack)
                where fnd  = searchr' sr sr (reverse retBack ++ xs) ""

    isFound = fst . fst
    remaining = snd . snd . fst
    getRollBack = snd . snd
    getRetBack = fst . snd
    processed = fst . snd . fst

    searchr' :: String->String->String->String->String
                -> ((Bool,(String,String)),(String,String))
    searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
                           searchr'' (drop (length rollBack) srch) srch' xs
fndSoFar
                                     (not (isEmpty rollBack),"","") sr

    searchr'' :: String->String->String->String->(Bool,String,String)->Char
             -> ((Bool,(String,String)),(String,String))
    searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),("",""))
    searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++
rollBack ++ fnd,[])),("",""))
    searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar
(cnt,retBack,rollBack) s
      | sr == x = if cnt && sr' == x && isEmpty retBack
                     then searchr'' srs srs' xs fndSoFar
(True,"",x:rollBack) s
                     else if not (isEmpty retBack)  || not (isEmpty
rollBack)
                             then searchr'' srs srch' xs fndSoFar
                                             (True,(x:rollBack) ++
retBack,"") s
                             else searchr'' srs srch' xs (x:fndSoFar)
(True,"","") s
      | otherwise = if isEmpty rollBack && isEmpty retBack
                       then if s == x
                               then ((False,(fndSoFar,xxs)),("",""))
                               else ((False,searchr''' s xs
(x:fndSoFar)),("",""))
                       else if sr/=x
                            then if isEmpty retBack
                                 then ((False,((x:rollBack)++fndSoFar, xs)),
("",""))
                                 else ((False,(fndSoFar, xxs)),
(retBack,""))
                            else if isEmpty retBack
                                 then ((False,(fndSoFar, xs)),
("",x:rollBack))
                                 else ((False,(fndSoFar, xxs)),
(retBack,""))


    searchr''' :: Char->String->String -> (String,String)
    searchr''' sr [] fndSoFar = (fndSoFar,[])
    searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs
(x:fndSoFar)
                             | otherwise = (fndSoFar,xxs)
    isEmpty [] = True
    isEmpty (a:as) = False
-------------------------------------------------------------------------------
these are timings:

$ time MyBane.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m19.984s
user    0m0.015s
sys     0m0.016s

bmaxa@MAXA ~/tutorial
$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m13.719s
user    0m0.015s
sys     0m0.000s

It's 6 seconds difference.
Your KMP is always fastest of all version in any combination on  2gb p4 3ghz
hyperthreaded windows. Same results on linux will be I guess with same
machine
as I;ve previously tested on linux but some other p4 and results were
proportionally the same.

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Mon, 12 Dec 2005 16:15:46 +0100
>
>Earlier today:
> > Sorry, but
> > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > "abababaaba"
> >
> > I haven't analyzed the algorithm, so I don't know why exactly this
>fails.
> > I'll take a look sometime soon.
> >
>
>I found the problem (one at least).
>Say the pattern to be replaced begins with 'a' and we have a sufficiently
>long
>match with the pattern starting at the first 'a' in the String. Upon
>encountering the second 'a', while the first pattern still matches, you
>start
>pushing onto the rollback-stack. But that isn't inspected anymore, so if
>the
>actual occurence of the pattern starts at the third (or fourth, n-th)
>occurence of 'a' and that is already pushed onto the rollback, you miss it.

I've corrected this with adjusting rollback position. if rollBack is null
then
search for rollback starts at second character if not starts at same as
searhed
character because I skip what was searched. That's all.
Though I'm not so sure now when I read this.

>
>So the question is, can we find a cheap test to decide whether to use KMP
>or
>Bulat's version?

In real world situation your KMP will always be fastest on average.
I like that we are not using C arrays as then we have advantage
of lazyness and save on memory usage. C++ program will be faster
on shorter strings but on this large strings will loose due memory
latency. and with your test, both programs are very fast.

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Mon, 12 Dec 2005 16:15:46 +0100
>
>Earlier today:
> > Sorry, but
> > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > "abababaaba"
> >
> > I haven't analyzed the algorithm, so I don't know why exactly this
>fails.
> > I'll take a look sometime soon.
> >
>
>I found the problem (one at least).
>Say the pattern to be replaced begins with 'a' and we have a sufficiently
>long
>match with the pattern starting at the first 'a' in the String. Upon
>encountering the second 'a', while the first pattern still matches, you
>start
>pushing onto the rollback-stack. But that isn't inspected anymore, so if
>the
>actual occurence of the pattern starts at the third (or fourth, n-th)
>occurence of 'a' and that is already pushed onto the rollback, you miss it.
>
>let src = concat (replicate n "abc") ++ "d"
>let str = concat (replicate (n+k) "abc") ++ "d"
>then
>searchReplace src "Success!" str
>will work correctly iff k is congruent to 0 or 1 modulo (n+1).
>

Oh, yes this seems the problem for searchr :(
I have to look for efficient way in order to circumvent repeated searches.
But since your KMP is fastest of all now, I am considering if there
is any point now to correct this.

And searchr ugly version that I've posted has a bug (not present in MyBane
pretty
version) .
should be :
else if sr'/=x

Greetings, Bane.

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Branimir Maksimovic-2
Am Montag, 12. Dezember 2005 16:28 schrieben Sie:

> From: Daniel Fischer <[hidden email]>
>
> >To: "Branimir Maksimovic" <[hidden email]>
> >CC: [hidden email]
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 16:15:46 +0100
> >
> >Earlier today:
> > > Sorry, but
> > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > "abababaaba"
> > >
> > > I haven't analyzed the algorithm, so I don't know why exactly this
> >
> >fails.
> >
> > > I'll take a look sometime soon.
> >
> >I found the problem (one at least).
> >Say the pattern to be replaced begins with 'a' and we have a sufficiently
> >long
> >match with the pattern starting at the first 'a' in the String. Upon
> >encountering the second 'a', while the first pattern still matches, you
> >start
> >pushing onto the rollback-stack. But that isn't inspected anymore, so if
> >the
> >actual occurence of the pattern starts at the third (or fourth, n-th)
> >occurence of 'a' and that is already pushed onto the rollback, you miss
> > it.
>
> I've corrected this with adjusting rollback position. if rollBack is null
> then
> search for rollback starts at second character if not starts at same as
> searhed
> character because I skip what was searched. That's all.
> Though I'm not so sure now when I read this.
>
Still not working:

*New> searchReplace "abababc" "#" "ababababababc"
"ababababababc"
*New> searchReplace1 "abababc" "#" "ababababababc"
"ababababababc"


> >So the question is, can we find a cheap test to decide whether to use KMP
> >or
> >Bulat's version?
>
> In real world situation your KMP will always be fastest on average.
> I like that we are not using C arrays as then we have advantage
> of lazyness and save on memory usage. C++ program will be faster
> on shorter strings but on this large strings will loose due memory
> latency. and with your test, both programs are very fast.
>
> Greetings, Bane.
>

On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20%
faster than my KMP on your test -- btw, I unboxed the pat array, which gave a
bit of extra speed, but not much.
And apologies to Sebastian Sylvan, I also included an unboxed version of bord,
built from the boxed version, and that sped things further up -- not much,
again, but there it is.
I wonder about this difference, -10% on one system and +20% on another system,
ist that normal?

Cheers, Daniel
------------------------------------------------------------------------------------------
Up-To-Date version of KMP:

import Data.Array.Unboxed (UArray, listArray, (!))
import qualified Data.Array as A (array, (!), elems)

searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
    where
      len = {-# scc "len" #-} length src
      pat :: UArray Int Char
      pat = {-# scc "pat" #-} listArray (0,len-1) src
      bord ={-# scc "bord" #-} A.array (0,len) $ (0,-1):(1,0):
                 [(i+1,getBord (pat!i) i + 1) | i <- [1 .. len-1]]
      getBord s n
         | m < 0      = m
         | s == pat!m = m
         | otherwise  = getBord s m
           where
             m = bord A.! n
      bor :: UArray Int Int
      bor = listArray (0,len) $ A.elems bord
      getBor s n
         | m < 0 || s == pat!m = m
         | otherwise = getBor s m
           where
             m = bor!n
      process n str _ | n >= len = {-# scc "process" #-} dst ++ process 0 str
""
      process _ "" mat = {-# scc "process" #-} reverse mat
      process 0 (s:st) _
         | s == c    = {-# scc "process" #-} process 1 st [s]
         | otherwise = {-# scc "process" #-} s:process 0 st ""
      process n str@(s:st) mat
         | s == pat!n = {-# scc "process" #-} process (n+1) st (s:mat)
         | otherwise  = {-# scc "process" #-}
                        let j = getBor s n
                            (ret,skip) = splitAt j mat
                        in if j < 0 then reverse mat ++ process 0 str ""
                           else reverse skip ++ process (j+1) st (s:ret)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Tue, 13 Dec 2005 11:23:29 +0100
>
>Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
> > From: Daniel Fischer <[hidden email]>
> >
> > >To: "Branimir Maksimovic" <[hidden email]>
> > >CC: [hidden email]
> > >Subject: Re: [Haskell-cafe] Substring replacements
> > >Date: Mon, 12 Dec 2005 16:15:46 +0100
> > >
> > >Earlier today:
> > > > Sorry, but
> > > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > > "abababaaba"
> > > >
> > > > I haven't analyzed the algorithm, so I don't know why exactly this
> > >
> > >fails.
> > >
> > > > I'll take a look sometime soon.
> > >
> > >I found the problem (one at least).
> > >Say the pattern to be replaced begins with 'a' and we have a
>sufficiently
> > >long
> > >match with the pattern starting at the first 'a' in the String. Upon
> > >encountering the second 'a', while the first pattern still matches, you
> > >start
> > >pushing onto the rollback-stack. But that isn't inspected anymore, so
>if
> > >the
> > >actual occurence of the pattern starts at the third (or fourth, n-th)
> > >occurence of 'a' and that is already pushed onto the rollback, you miss
> > > it.
> >
> > I've corrected this with adjusting rollback position. if rollBack is
>null
> > then
> > search for rollback starts at second character if not starts at same as
> > searhed
> > character because I skip what was searched. That's all.
> > Though I'm not so sure now when I read this.
> >
>Still not working:
>
>*New> searchReplace "abababc" "#" "ababababababc"
>"ababababababc"
>*New> searchReplace1 "abababc" "#" "ababababababc"
>"ababababababc"
>

Yes, perhaps you've missed another post of mine. I've noticed
that problem when pattern repeats more then 2 times and gave up
because now whatever I do, your version is always fastest.

>
> > >So the question is, can we find a cheap test to decide whether to use
>KMP
> > >or
> > >Bulat's version?

Just interleave string with  search hits with one with no seacrh (that means
partial too)
hits, and your version will gain in speed.
More partial matches and full search matches Bulat's version will gain in
speed.
Longer search strings, your version will have gains.

> >
> > In real world situation your KMP will always be fastest on average.
> > I like that we are not using C arrays as then we have advantage
> > of lazyness and save on memory usage. C++ program will be faster
> > on shorter strings but on this large strings will loose due memory
> > latency. and with your test, both programs are very fast.
> >
> > Greetings, Bane.
> >
>
>On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about
>20%
>faster than my KMP on your test -- btw, I unboxed the pat array, which gave
>a
>bit of extra speed, but not much.

I think that's because on your machine Bulat's version have better
perfromance
with CPU cache.
I don;t know but now your version is 25% faster with my test on P4
hyperthreaded.

your new version:
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m8.734s
user    0m0.015s
sys     0m0.000s

Bulat's version:

bmaxa@MAXA ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m11.734s
user    0m0.015s
sys     0m0.015s

3 secs difference now.

>And apologies to Sebastian Sylvan, I also included an unboxed version of
>bord,
>built from the boxed version, and that sped things further up -- not much,
>again, but there it is.

On my machine you got another 10-15% of boost with unboxed arrays.

>I wonder about this difference, -10% on one system and +20% on another
>system,
>ist that normal?

Different caching schemes on CPU's perhaps? different memory latencies?
hyperthreading helps your version? more code and data, perhaps because
of that it pays the price on your machine?

Greetings, Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Tue, 13 Dec 2005 11:23:29 +0100
>

After seeing that your program is fastest (I've also tried one from
http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
that good in converting to search replace?) I've decided to
try with Rabin-Karp algorithm.
This algorithm performs same operation as straightforward search,
but compares hashes instead of chars.
With ability to rotate hash (remove first, add next) characters
there is also optimisation, that hash is calculated only for single
next character rather again for whole substring.
Unfortunatelly on my machine it is very cheap to compare
characters so with my test hashing overweights character compare,
except in your test when hash searching is faster then straightforward
search.

This is best I can write in terms of performance and readability.
I've tried with getFst that returns Maybe but it was slower so I decided
to return '\0' in case that argument is empty list, which renders '\0'
unusable, but then I really doubt that 0 will be used in strings.

-- Rabin-Karp string search algorithm, it is very effective in searching of
set
-- of patterns of length n on same string
-- this program is for single pattern search, but can be crafted
-- for multiple patterns of length m

hSearchReplace :: String -> String -> String -> String
hSearchReplace sr rp xs
    | not (null remaining) = found ++ rp
                             ++ hSearchReplace sr rp (drop (length sr)
remaining)
    | otherwise = found
    where
    (found,remaining) = hSearch sr xs

hSearch :: String -> String -> (String,String)
hSearch sr xs = hSearch' sr xs hcmp ""
    where
        hsrch = hash sr
        hcmp = hash $ take ls xs
        cmp = take ls xs
        ls = length sr

        hSearch' [] xs _ _= (xs,[])
        hSearch' sr [] _ fndFail = (reverse fndFail,[])
        hSearch' srch xxs@(x:xs) hcmps fndFail
            = if hsrch == hcmps
                 then if isPrefixOf srch xxs
                         then (reverse fndFail,xxs)
                         else searchAgain
                 else searchAgain
            where
            searchAgain
             = hSearch' srch xs
                        (hashRotate (getFst xxs) (getFst nextxxs) (ls-1)
hcmps)
                        (x:fndFail)
            nextxxs = drop ls xxs

getFst :: String -> Char
getFst [] = '\0'
getFst (a:as) = a

hash :: String -> Int
hash str
    =  hash' str (length str - 1)
    where
    hash' :: String -> Int -> Int
    hash' [] _ = 0
    hash' (s:str) pow  = (101 ^ pow) *(fromEnum s)
                         + hash' str (pow-1)

hashRotate :: Char -> Char -> Int -> Int -> Int
hashRotate cout cin pow hsh
    = (hsh - ((101 ^ pow) * (fromEnum cout)))*101
      + (fromEnum cin)



Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Branimir Maksimovic-2
Hi, Bane and all,

Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie:
> > > In real world situation your KMP will always be fastest on average.
> > > I like that we are not using C arrays as then we have advantage
> > > of lazyness and save on memory usage. C++ program will be faster
> > > on shorter strings but on this large strings will loose due memory
> > > latency. and with your test, both programs are very fast.

Yesterday, I did the unspeakable -- I wrote a C-version. Smashes
Haskell-performance for short enough Strings (factor 10 for my test, factor
2.2 for Bane's), but once it starts swapping, we catch up, and for really
large Strings I dare say we'd win far out.

I also managed to get my KMP still faster, using take and drop instead of
splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was
what slowed it down, I reintroduced searchr''' and both are equal).
I'm not quite sure, whether that indeed helps, but I also chose to use
listArray for the boxed array of borders.

Now it's
searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
    where
      len = length src
      pat :: UArray Int Char
      pat = listArray (0,len-1) src
      bord = A.listArray (0,len) $ (-1):0:
                       [getBord (pat!i) i + 1 | i <- [1 .. len-1]]
      getBord s n
         | m < 0      = m
         | s == pat!m = m
         | otherwise  = getBord s m
           where
             m = bord A.! n
      bor :: UArray Int Int
      bor = listArray (0,len) $ A.elems bord
      getBor s n
         | m < 0 || s == pat!m = m
         | otherwise = getBor s m
           where
             m = bor!n
      process n str _ | n >= len = dst ++ process 0 str ""
      process _ "" mat = reverse mat
      process 0 (s:st) _
         | s == c    = process 1 st [s]
         | otherwise = s:process 0 st ""
      process n str@(s:st) mat
         | s == pat!n = process (n+1) st (s:mat)
         | otherwise  =
            case getBor s n of
              -1 -> reverse mat ++ process 0 str ""
              0  -> reverse mat ++ process 1 st [s]
              j  -> reverse (drop j mat) ++ process (j+1) st (s:take j mat)


gives a speedup of roughly 10% on my box versus yesterday's version.

> > >
> > > Greetings, Bane.
> >
> >On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about
> >20%
> >faster than my KMP on your test -- btw, I unboxed the pat array, which
> > gave a
> >bit of extra speed, but not much.
>
> I think that's because on your machine Bulat's version have better
> perfromance
> with CPU cache.
> I don;t know but now your version is 25% faster with my test on P4
> hyperthreaded.

Errrr, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with
computers.

>
> your new version:
> $ time srchrep.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m8.734s
> user    0m0.015s
> sys     0m0.000s
>
> Bulat's version:
>
> bmaxa@MAXA ~/tutorial
> $ time replace1.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m11.734s
> user    0m0.015s
> sys     0m0.015s
>
> 3 secs difference now.
>
> >And apologies to Sebastian Sylvan, I also included an unboxed version of
> >bord,
> >built from the boxed version, and that sped things further up -- not much,
> >again, but there it is.
>
> On my machine you got another 10-15% of boost with unboxed arrays.
>
> >I wonder about this difference, -10% on one system and +20% on another
> >system,
> >ist that normal?
>
> Different caching schemes on CPU's perhaps? different memory latencies?
> hyperthreading helps your version? more code and data, perhaps because
> of that it pays the price on your machine?
>
> Greetings, Bane.
>
Well, whatever. Upto now, on my box, Bulat's is still the fastest for your
test -- though I've narrowed the gap quite a bit.

Cheers,
Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Daniel Fischer-4
In reply to this post by Branimir Maksimovic-2
Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected that
all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near KMP
(or would it?). However, I don't see how to (efficiently) do a multiple
pattern search with KMP, so there -- if all patterns have the same length,
otherwise I don't see -- Rabin-Karp would probably be the method of choice.

Am Mittwoch, 14. Dezember 2005 10:16 schrieben Sie:

> From: Daniel Fischer <[hidden email]>
>
> >To: "Branimir Maksimovic" <[hidden email]>
> >CC: [hidden email]
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Tue, 13 Dec 2005 11:23:29 +0100
>
> After seeing that your program is fastest (I've also tried one from
> http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> that good in converting to search replace?) I've decided to
> try with Rabin-Karp algorithm.
> This algorithm performs same operation as straightforward search,
> but compares hashes instead of chars.
> With ability to rotate hash (remove first, add next) characters
> there is also optimisation, that hash is calculated only for single
> next character rather again for whole substring.
> Unfortunatelly on my machine it is very cheap to compare
> characters so with my test hashing overweights character compare,
> except in your test when hash searching is faster then straightforward
> search.
>
> This is best I can write in terms of performance and readability.
> I've tried with getFst that returns Maybe but it was slower so I decided
> to return '\0' in case that argument is empty list, which renders '\0'
> unusable, but then I really doubt that 0 will be used in strings.
>
> -- Rabin-Karp string search algorithm, it is very effective in searching of
> set
> -- of patterns of length n on same string
> -- this program is for single pattern search, but can be crafted
> -- for multiple patterns of length m
>

I tuned it up somewhat:
import Data.List (isPrefixOf)
import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
-- faster for my test, but slower for yours, but only a whiff.

searchrep :: String -> String -> String -> String
searchrep "" _ str = str    -- or cycle rp, or error?
searchrep sr rp xs = hSearchRep xs  -- don't carry more around than necessary
    where
       len = length sr       -- we don't want that to be recomputed
       hsrch = hash sr     -- neither that
       hSearchRep  "" = ""
       hSearchRep xs
           | null remaining = passed
           | otherwise      = passed ++ rp ++ hSearchRep (drop len remaining)
             where
               (passed,remaining) = hSearch xs  -- ' xs (hash $ take len xs) ""
       hSearch xs = hSearch' xs hcmp "" -- since hSearch will be optimised
           where                                     -- away anyway, we might
              hcmp = hash $ take len xs   -- as well eliminate it ourselves
       hSearch' "" _ got = (reverse got, "")
       hSearch' xxs@(x:xs) hcd got
           | hcd == hsrch && (sr `isPrefixOf` xxs) = (reverse got, xxs)
           | otherwise                             = searchAgain -- one test less
             where
               searchAgain = case drop len xxs of
                []     -> (reverse got ++ xxs, "")   -- then we know we're done
                (y:_)  -> hSearch' xs (hashRotate x y hcd) (x:got)
-- no need for fancy getFst anymore
-- making hashRotate local eliminates one argument, makes it faster
       hashRotate :: Char -> Char -> Int -> Int
       hashRotate cout cin hsh = 101*(hsh - 101^(len-1)*ord cout) + ord cin
-- using foldl for hash is an enormous boost
       hash :: String -> Int
       hash = foldl ((. ord) . (+) . (*101)) 0
-- hash str = foldl (\n c -> 101*n+ord c) 0 str
-- this is equally fast as the point-free version, easier to read, probably,
-- but I like an occasional pointless pointfreeness.

Now this beats everything but KMP on my test very clearly.
dafis@linux:~/Documents/haskell/Allotria/Search> time myhash; time myhash2
Working: seasearch replace  able seaseaseasearch baker ssseasearch charlie
True
Done


real    0m50.401s
user    0m49.990s
sys     0m0.060s
Working very long
True
Done

real    0m15.747s
user    0m15.630s
sys     0m0.020s

Still poor on your test, though.

Cheers,
Daniel

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Wed, 14 Dec 2005 17:10:20 +0100
>

>
> > I think that's because on your machine Bulat's version have better
> > perfromance
> > with CPU cache.
> > I don;t know but now your version is 25% faster with my test on P4
> > hyperthreaded.
>
>Errrr, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with
>computers.

I think that i've figure it now.
Hyperthreading is hardware CPU feature that single CPU core
can speed up execution of two running threads.
For example if one thread uses integer unit and other FP unit
CPU executes that in parallel. But that's not important
or significant. What is interestenting is memory latency.
If one thread peeks and pokes around memory for , say 1
unit of time, with usual CPU two thread will execute
2 units of time. Hyperthreaded (I'm talking about intel implementation)
CPU will execute that in 1.4 points of time giving 60% boost
in terms of speed. I've tested some assembler and C program
that launches two threads each roaming over memory
to anulate impact of cache.
What is noticable is that two threads have 60% less memory
latency constantly then single thread. That means if
single thread for each out of cache memory access waits
300-400 CPU cycles, two threads wait 60% less.
Now what has that to do with our programs as they are single
threaded? I think it's garbage collection.
Our programs run with garbage collector in background and
you feel that burden by 20% as your program probably pushes
garbage collector to work more than Bulat's version.
On hyperthreaded CPU impact of garbage collection is reduced
by a factor of 30-60 % resulting in your program being 30% faster
on my machine.

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2
In reply to this post by Daniel Fischer-4



>From: Daniel Fischer <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Wed, 14 Dec 2005 20:40:06 +0100
>
>Hi Bane,
>
>nice algorithm. Since comparing chars _is_ cheap, it is to be expected that
>all the hash-rotating is far more costly for short search patterns. The
>longer the pattern, the better this gets, I think -- though nowhere near
>KMP
>(or would it?).

Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.

However, I don't see how to (efficiently) do a multiple
>pattern search with KMP, so there -- if all patterns have the same length,
>otherwise I don't see -- Rabin-Karp would probably be the method of choice.

Yes, this algorithm can search in parallel patterns of same length.
Different search patterns have to be searched same way as with KMP.

>
>I tuned it up somewhat:
>import Data.List (isPrefixOf)
>import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
>-- faster for my test, but slower for yours, but only a whiff.

Wow, on my machine your version of Rabin-Karp gives 30% boost to my test.
This helps me learn Haskell , too .

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2



>From: "Branimir Maksimovic" <[hidden email]>
>To: [hidden email]
>CC: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Thu, 15 Dec 2005 00:55:02 +0000
>
>
>
>
>>From: Daniel Fischer <[hidden email]>
>>To: "Branimir Maksimovic" <[hidden email]>
>>CC: [hidden email]
>>Subject: Re: [Haskell-cafe] Substring replacements
>>Date: Wed, 14 Dec 2005 20:40:06 +0100
>>
>>Hi Bane,
>>
>>nice algorithm. Since comparing chars _is_ cheap, it is to be expected
>>that
>>all the hash-rotating is far more costly for short search patterns. The
>>longer the pattern, the better this gets, I think -- though nowhere near
>>KMP
>>(or would it?).
>
>Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
>algorithm yet, though. But I think it would be
>difficult to implement it in Haskell efficiently as it searches backwards
>and jumps around, and we want memory savings.
>Though, I even didn't tried yet, but it is certainly very interesting.
>

Forget what I've said.
Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
forward, but when it finds last character in pattern, than starts to search
backwards.
This can be implemented easilly as Haskell lists naturaly reverse order
when putting from one list to other.
Heh, never say never :)
As I see from documents Boyer-Moore has best performance on average
and should be better than KMP.

Greetings,Bane.

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

ajb@spamcop.net
In reply to this post by Branimir Maksimovic-2
G'day all.

Quoting Branimir Maksimovic <[hidden email]>:

> After seeing that your program is fastest (I've also tried one from
> http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> that good in converting to search replace?)

You probably did it right, but you could post your version to the
list if you want me to take a look.

When I wrote the RunTimeCompilation code, it wasn't intended to be a
shining example of efficiency, merely an illustration.  Remember
that it's doing TWO things: compiling the pattern to code, and then
performing the search.  The compilation phase is likely to be much
slower than the search, so the speedup (if any!) would only be realised
the SECOND time that you searched a string using the same pattern.
(Assuming you re-used the compiled match code, of course!)

Cheers,
Andrew Bromage
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Substring replacements

Branimir Maksimovic-2



>From: [hidden email]
>To: [hidden email]
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Thu, 15 Dec 2005 00:25:19 -0500
>
>G'day all.
>
>Quoting Branimir Maksimovic <[hidden email]>:
>
> > After seeing that your program is fastest (I've also tried one from
> > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> > that good in converting to search replace?)
>
>You probably did it right, but you could post your version to the
>list if you want me to take a look.
>
Oh, here it is but just don;t laugh :)
I've hacked with unsafePerformIO as I din't know
how to remove IO from match any other way.

searchReplaceKMP :: String->String->String -> String
searchReplaceKMP sr rp s
    | not (null remaining) = found++rp
                             ++ searchReplaceKMP sr rp remaining
    | otherwise = found
    where
        (found,remaining) = unsafePerformIO $ matchKMP sr s

matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a]))
matchKMP []
    = error "Can't match empty list"
matchKMP xs
    = matchfunc []
      where
        matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
        dofail = \ps xs -> case xs of
                                [] -> fail "can't match"
                                (y:ys) -> matchfunc (y:ps) ys

type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a])

makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)]
                -> PartialMatchFunc m a
makeMatchFunc prev []
    = \ps xs -> return (reverse (drop ((length prev)-1) ps), xs)
makeMatchFunc prev ((x,failstate):ms)
    = thisf
      where
        mf = makeMatchFunc (thisf:prev) ms
        failcont = prev !! (length prev - failstate - 1)
        thisf = \ps xs -> case xs of
                                [] -> fail "can't match"
                                (y:ys) -> if (x == y) then mf (y:ps) ys
                                                else failcont ps xs

overlap :: (Eq a) => [a] -> [Int]
overlap str
    = overlap' [0] str
      where
        overlap' prev []
          = reverse prev
        overlap' prev (x:xs)
          = let get_o o
                 | o <= 1 || str !! (o-2) == x = o
                 | otherwise = get_o (1 + prev !! (length prev - o + 1))
                in overlap' (get_o (head prev + 1):prev) xs

--------------------------------------------------------------------------------------
These are timings (it's performance is about the same as Rabin-Karp):

$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
searchr.exe: user error (can't match)


real    0m22.187s
user    0m0.015s
sys     0m0.015s

bmaxa@MAXA ~/tutorial
$ ghc -fglasgow-exts  -O2 searchr.hs --make  -o searchr.exe
Chasing modules from: searchr.hs
Compiling Main             ( searchr.hs, searchr.o )
Linking ...

bmaxa@MAXA ~/tutorial
$ time searchr.exe
Working very long
True
Done

real    0m8.110s
user    0m0.031s
sys     0m0.016s


>When I wrote the RunTimeCompilation code, it wasn't intended to be a
>shining example of efficiency, merely an illustration.  Remember
>that it's doing TWO things: compiling the pattern to code, and then
>performing the search.  The compilation phase is likely to be much
>slower than the search, so the speedup (if any!) would only be realised
>the SECOND time that you searched a string using the same pattern.
>(Assuming you re-used the compiled match code, of course!)

Oh, that explaines it. Actually this has to be converted to searchReplace
in order to be fast, but I don;t know how (yet) as your program
is pretty complicated to my humble Haskell skills.
I think that your technique can be usefull with Aho-Corasick algorithm
as it first constructs finite automaton from tree, then performs search.
So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run
time compilation, but this is too advanced for me for now.

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12