One-Off detection. Question about Space/Time complexity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

One-Off detection. Question about Space/Time complexity

Michael Litchard-2

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

David Turner-2
I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.

The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" <[hidden email]> wrote:

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

Andrew Butterfield-2
It can be done in one sweep down both strings together - look for the first difference,
and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False
oneOff [] (_:rest) = null rest
oneOff (_:rest) [] = null rest
oneOff s1@(c1:s1') s2@(c2:s2')
 | c1 == c2  =  oneOff s1' s2'  -- keep looking
 | otherwise  
     =    s1' == s2   -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted 
       || s2' == s1   -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted
       || s1' == s2'  -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests
O(n) in the length of the shortest String

Cheers,
  Andrew

On 22 Mar 2017, at 18:48, David Turner <[hidden email]> wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.


The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" <[hidden email]> wrote:

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

David Turner-2
I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared.

On 23 March 2017 at 08:52, Andrew Butterfield <[hidden email]> wrote:
It can be done in one sweep down both strings together - look for the first difference,
and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False
oneOff [] (_:rest) = null rest
oneOff (_:rest) [] = null rest
oneOff s1@(c1:s1') s2@(c2:s2')
 | c1 == c2  =  oneOff s1' s2'  -- keep looking
 | otherwise  
     =    s1' == s2   -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted 
       || s2' == s1   -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted
       || s1' == s2'  -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests
O(n) in the length of the shortest String

Cheers,
  Andrew

On 22 Mar 2017, at 18:48, David Turner <[hidden email]> wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.


The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" <[hidden email]> wrote:

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

Andrew Butterfield-2
Yes - they both might be scanned twice, but O(2n) = O(n) = O(3n), even.

The scanniing by  the three uses of == only walk over *all* of the longer list if its length is at most that of the shorter

Simply put, if  s1 is much longer than s2, then those == scans will end when they reach the end of the shorter s2

Or am I missing something?

On 23 Mar 2017, at 15:19, David Turner <[hidden email]> wrote:

I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared.

if m == n your timing reduces to O(0) !?

On 23 March 2017 at 08:52, Andrew Butterfield <[hidden email]> wrote:
It can be done in one sweep down both strings together - look for the first difference,
and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False
oneOff [] (_:rest) = null rest
oneOff (_:rest) [] = null rest
oneOff s1@(c1:s1') s2@(c2:s2')
 | c1 == c2  =  oneOff s1' s2'  -- keep looking
 | otherwise  
     =    s1' == s2   -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted 
       || s2' == s1   -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted
       || s1' == s2'  -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests
O(n) in the length of the shortest String

Cheers,
  Andrew

On 22 Mar 2017, at 18:48, David Turner <[hidden email]> wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.


The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" <[hidden email]> wrote:

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

David Turner-2
Sorry, I missed the vital word *space*. It looks time-efficient, but doesn't look O(1) in space.

On 23 Mar 2017 16:12, "Andrew Butterfield" <[hidden email]> wrote:
Yes - they both might be scanned twice, but O(2n) = O(n) = O(3n), even.

The scanniing by  the three uses of == only walk over *all* of the longer list if its length is at most that of the shorter

Simply put, if  s1 is much longer than s2, then those == scans will end when they reach the end of the shorter s2

Or am I missing something?

On 23 Mar 2017, at 15:19, David Turner <[hidden email]> wrote:

I also think it can be done in one sweep, but I think this code is O(max(m,n)-min(m,n)): s1' and s2' each appear twice so will be shared.

if m == n your timing reduces to O(0) !?


On 23 March 2017 at 08:52, Andrew Butterfield <[hidden email]> wrote:
It can be done in one sweep down both strings together - look for the first difference,
and then test hypotheses regarding the 5 possible edits

oneOff :: String -> String -> Bool

oneOff [] [] = False
oneOff [] (_:rest) = null rest
oneOff (_:rest) [] = null rest
oneOff s1@(c1:s1') s2@(c2:s2')
 | c1 == c2  =  oneOff s1' s2'  -- keep looking
 | otherwise  
     =    s1' == s2   -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted 
       || s2' == s1   -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted
       || s1' == s2'  -- c1 is s1 was replaced by c2 in s2, or v.v.

I guess the above should be O(1) in space - it's just a list crawl with tests
O(n) in the length of the shortest String

Cheers,
  Andrew

On 22 Mar 2017, at 18:48, David Turner <[hidden email]> wrote:

I think the space must be O(m+n) since you are using strict Text values. But even if you weren't, you're using s1 and s2 in more than one place so they will be shared, leading to them both being fully in memory at once.

The minimum possible space complexity could well be O(1) if using String or lazy Text.


The time complexity looks like O(min(m,n)), because that's the cost of `zip`/`filter`, not counting the cost of loading the two Texts into memory in the first place (which will be O(m+n)). If you used String, it'd be O(m+n) since that's the cost of the two calls to `length`. I think the minimum possible time complexity will be O(min(m,n)), and that could include the cost of loading the strings too.

On 22 Mar 2017 18:32, "Michael Litchard" <[hidden email]> wrote:

Problem spec from CareerCup.

Given two strings, return boolean True/False if they are only one edit apart. Edit can be insert/delete/update of only one character in the string. Eg.

-True

xyz,xz

xyz,xyk

xy,xyz

-False

xyz,xyz

xyz,xzy

x,xyz

module Strings.OneLetter where`

import Prelude
import qualified Data.Text as T`

oneLetter :: T.Text -> T.Text -> Bool
oneLetter s1 s2
  | s1 == s2 = False
  | max_l > (min_l + 1) = False
  | max_l == min_l = diff_size == 1
  | otherwise = diff_size == 0
  where
    length_s1 = T.length s1
    length_s2 = T.length s2
    max_l = max length_s1 length_s2
    min_l = min length_s1 length_s2
    diff_size = length $ filter (\(a,b) -> a /= b) zipped
    zipped = T.zip s1 s2`

So, I used Text instead of String, hoping I could take advantage of fusion. I have the following questions and my initial attempt to answer them.

What is the time complexity of oneLetter 0(m+n) where m is the length of s1 and n is the length of s2

what is the space complexity of oneLetter? I'm thinking due to laziness it's O(1), only two Chars are in memory at any one time, or two Ints. But I'm hazy on why. If this is wrong, please articulate why. If I'm right, and my reasoning is wrong or incomplete, please say why.

I don't think I can improve the time complexity. Am I right? Can the space complexity be improved?

What if I changed from Text to String? I don't think the time complexity changes, but how does this change the space complexity?




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

Olaf Klinke
In reply to this post by Michael Litchard-2
I believe this is the classic publication on the subject:

http://users.monash.edu/~lloyd/tildeStrings/Alignment/92.IPL.html

The Haskell implementation is here:

http://users.monash.edu/~lloyd/tildeFP/Haskell/1998/Edit01/

In case of a pair of one-off strings, it is O(m) time. Not sure whether the entire strings are kept in memory. My gut feeling is that they must be kept in memory in the worst case (edit distance max(m,n)) but stretches of matched prefixes can be garbage collected. So in your particular case it should indeed be O(1) space. Profiling suggests the algorithm is not O(1) space, but I might have done the input strings wrong.

input :: Int -> String
input = flip replicate 'a'

length . input should be O(1) space, shouldn't it? With ghc 7.8.3 and -O2 it is not. Running the edit distance on the pair (input m, 'b' : input m) and profiling the program suggests the algorithm is O(m) space. Andrew Butterfield's oneOff seems to be linear space, too.
-- Olaf

$cat test.hs
import System.Environment
import Data.List (foldl')
import EditDistance (d)
input :: Int -> String
input n = replicate n 'a'

main = do
  args <- getArgs
  let n = read (head args)
  let x = input $! (n+1)
  let y = 'b' : input n
  print (d x y)

$ghc -O2 --make -prof -fprof-auto test.hs
[2 of 2] Compiling Main             ( test.hs, test.o )
Linking test ...

$for n in 1000 10000 100000; do ./test $n +RTS -p && grep 'total alloc' test.prof; done
1
        total alloc =     358,128 bytes  (excludes profiling overheads)
1
        total alloc =   3,022,696 bytes  (excludes profiling overheads)
1
        total alloc =  29,663,256 bytes  (excludes profiling overheads)

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: One-Off detection. Question about Space/Time complexity

Richard A. O'Keefe
In reply to this post by Andrew Butterfield-2

> On 23/03/2017, at 9:52 PM, Andrew Butterfield <[hidden email]> wrote:
>
> It can be done in one sweep down both strings together - look for the first difference,
> and then test hypotheses regarding the 5 possible edits
>
> oneOff :: String -> String -> Bool
>
> oneOff [] [] = False
> oneOff [] (_:rest) = null rest
> oneOff (_:rest) [] = null rest
> oneOff s1@(c1:s1') s2@(c2:s2')
>  | c1 == c2  =  oneOff s1' s2'  -- keep looking
>  | otherwise  
>      =    s1' == s2   -- s2 is s1 with c1 deleted or s1 is s2 with c1 inserted
>        || s2' == s1   -- s1 is s2 with c2 deleted or s2 is s1 with c2 inserted
>        || s1' == s2'  -- c1 is s1 was replaced by c2 in s2, or v.v.
>
> I guess the above should be O(1) in space

The question is what "O(1) in space" means here.

If you discount the storage required for the strings, fine.
If however you think in terms of lazy lists of characters,
the fact that this code will look at the tails of the lists
up to 3 times means that those tails will be stored, so we
are talking about O(n) space.

I believe it is possible to walk down s1 s1' s2 s2' in a
single pass, maintaining "streaming".

Whether it is _worth_ doing that is another matter.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.