Open Kattis Problem Srednji: Hints to improve my algorithm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Open Kattis Problem Srednji: Hints to improve my algorithm

Dominik Bollmann

Hi Haskell-Cafe,

I've been trying to solve the Open Kattis Problem called Srednji
recently, unfortunately without success.

Given a sequence A and a number B within that sequence this problem asks
to find all odd sub-sequences of A that, when sorted, have B as their
median in the middle. That is, from A we may remove some prefix and/or
suffix and if the resulting sub-sequence -- when sorted -- contains B in
the middle, then this sub-sequence is a solution. The problem asks to
find the number of all solutions. Check out
https://open.kattis.com/problems/srednji for the details.

My Haskell solution below tries to find the number of odd sub-sequences
by first locating the median and then repeatedly moving left and right
from that median to find larger and larger sub-sequence candidates. Each
found candidate is checked to have B in the middle when sorted in order
to become a solution. Moreover, I also extend each such candidate
further to the left (and to the right, respectively) to determine
whether these leftward or rightward extensions are solutions, too.

I think with this approach I systematically enumerate all solutions.
Unfortunately, though, this approach is too slow and times out on the
11th hidden test cases.

I'd therefore be thankful for hints about different approaches to
solving this problem more efficiently.


Thanks!

Dominik.

====================================================================

My current, slow Haskell code is this:

import Data.Maybe
import Data.Sequence (Seq, (<|), (|>))
import qualified Data.Sequence as Seq
import qualified Data.Vector.Unboxed as Vec

data SubSeq = SubSeq
  { getBalance :: {-# UNPACK #-} !Int
  , getSubSeq  :: Seq Int
  , from       :: {-# UNPACK #-} !Int
  , to         :: {-# UNPACK #-} !Int
  }

balancedSubSeqs :: [Int] -> Int -> [SubSeq]
balancedSubSeqs seq med = do
  candidate <- leftRight [val0]
  let lefts  = leftLeft candidate []
      rights = rightRight candidate []
  candidate ?: lefts ++ rights
  where
    medidx = fromJust (Vec.findIndex (== med) arr)
    val0   = SubSeq 0 (Seq.singleton med) medidx medidx
    arr    = Vec.fromList seq

    leftRight cands@(SubSeq balance seq i j : _)
      | i-1 < 0 || j+1 >= Vec.length arr = cands
      | otherwise =
        let v1       = arr Vec.! (i-1)
            v2       = arr Vec.! (j+1)
            balance' = newBalance balance v1 v2
            seq'     = (v1 <| seq) |> v2
        in leftRight (SubSeq balance' seq' (i-1) (j+1) : cands)

    leftLeft cand@(SubSeq balance seq i j) sols
      | i-2 < 0   = sols
      | otherwise =
        let v1 = arr Vec.! (i-2)
            v2 = arr Vec.! (i-1)
            balance' = newBalance balance v1 v2
            seq'     = v1 <| v2 <| seq
            newCand  = SubSeq balance' seq' (i-2) j
        in leftLeft newCand (newCand ?: sols)

    rightRight cand@(SubSeq balance seq i j) sols
      | j+2 >= Vec.length arr = sols
      | otherwise =
        let v1 = arr Vec.! (j+1)
            v2 = arr Vec.! (j+2)
            balance' = newBalance balance v1 v2
            seq'     = seq |> v1 |> v2
            newCand  = SubSeq balance' seq' i (j+2)
        in rightRight newCand (newCand ?: sols)

    newBalance old n1 n2
      | n1 < med, n2 < med = old - 2
      | n1 > med, n2 > med = old + 2
      | otherwise          = old

infixr 5 ?:
--(?:) :: SubSeq -> [SubSeq] -> [SubSeq]
x@(SubSeq b _ _ _) ?: xs
  | b == 0    = x : xs
  | otherwise = xs

main :: IO ()
main = do
  [len, median] <- fmap read . words <$> getLine
  seq <- fmap read . words <$> getLine
  let solutions = balancedSubSeqs seq median
  print (length solutions)
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Open Kattis Problem Srednji: Hints to improve my algorithm

Viktor Dukhovni
On Wed, Sep 23, 2020 at 09:15:40AM +0200, Dominik Bollmann wrote:

> Check out https://open.kattis.com/problems/srednji for the details.

Which gives a more precise statement of the problem.

> My Haskell solution below tries to find the number of odd sub-sequences
> by first locating the median and then repeatedly moving left and right
> from that median to find larger and larger sub-sequence candidates. Each
> found candidate is checked to have B in the middle when sorted in order
> to become a solution. Moreover, I also extend each such candidate
> further to the left (and to the right, respectively) to determine
> whether these leftward or rightward extensions are solutions, too.
>
> I think with this approach I systematically enumerate all solutions.
> Unfortunately, though, this approach is too slow and times out on the
> 11th hidden test cases.
>
> I'd therefore be thankful for hints about different approaches to
> solving this problem more efficiently.

This is not really a programming problem, writing the code is the easy
part.  Rather, this is an *algorithm* problem.  An efficient solution
uses a better algorithm, not a different implementation of the same
algorithm.  Your algorithm is not efficient.  Forget Haskell for the
moment, can you think of a better algorithm.  The above algorithm is
subtantially slower than optimal.

The best algorithm that comes to mind runs in linear time in the length
of the list, and requires linear (2N) additional space.  No sorting
(that would not be linear) or complex testing of candidates is required,
just some counting and O(N) book-keeping.

--
    Viktor.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Open Kattis Problem Srednji: Hints to improve my algorithm

Brent Yorgey
In reply to this post by Dominik Bollmann
Viktor is right.  Here's a small hint towards one way of solving it: start by replacing every number smaller than B by -1, and every number larger than B by 1 (and B itself by 0).

-Brent

On Wed, Sep 23, 2020 at 2:17 AM Dominik Bollmann <[hidden email]> wrote:

Hi Haskell-Cafe,

I've been trying to solve the Open Kattis Problem called Srednji
recently, unfortunately without success.

Given a sequence A and a number B within that sequence this problem asks
to find all odd sub-sequences of A that, when sorted, have B as their
median in the middle. That is, from A we may remove some prefix and/or
suffix and if the resulting sub-sequence -- when sorted -- contains B in
the middle, then this sub-sequence is a solution. The problem asks to
find the number of all solutions. Check out
https://open.kattis.com/problems/srednji for the details.

My Haskell solution below tries to find the number of odd sub-sequences
by first locating the median and then repeatedly moving left and right
from that median to find larger and larger sub-sequence candidates. Each
found candidate is checked to have B in the middle when sorted in order
to become a solution. Moreover, I also extend each such candidate
further to the left (and to the right, respectively) to determine
whether these leftward or rightward extensions are solutions, too.

I think with this approach I systematically enumerate all solutions.
Unfortunately, though, this approach is too slow and times out on the
11th hidden test cases.

I'd therefore be thankful for hints about different approaches to
solving this problem more efficiently.


Thanks!

Dominik.

====================================================================

My current, slow Haskell code is this:

import Data.Maybe
import Data.Sequence (Seq, (<|), (|>))
import qualified Data.Sequence as Seq
import qualified Data.Vector.Unboxed as Vec

data SubSeq = SubSeq
  { getBalance :: {-# UNPACK #-} !Int
  , getSubSeq  :: Seq Int
  , from       :: {-# UNPACK #-} !Int
  , to         :: {-# UNPACK #-} !Int
  }

balancedSubSeqs :: [Int] -> Int -> [SubSeq]
balancedSubSeqs seq med = do
  candidate <- leftRight [val0]
  let lefts  = leftLeft candidate []
      rights = rightRight candidate []
  candidate ?: lefts ++ rights
  where
    medidx = fromJust (Vec.findIndex (== med) arr)
    val0   = SubSeq 0 (Seq.singleton med) medidx medidx
    arr    = Vec.fromList seq

    leftRight cands@(SubSeq balance seq i j : _)
      | i-1 < 0 || j+1 >= Vec.length arr = cands
      | otherwise =
        let v1       = arr Vec.! (i-1)
            v2       = arr Vec.! (j+1)
            balance' = newBalance balance v1 v2
            seq'     = (v1 <| seq) |> v2
        in leftRight (SubSeq balance' seq' (i-1) (j+1) : cands)

    leftLeft cand@(SubSeq balance seq i j) sols
      | i-2 < 0   = sols
      | otherwise =
        let v1 = arr Vec.! (i-2)
            v2 = arr Vec.! (i-1)
            balance' = newBalance balance v1 v2
            seq'     = v1 <| v2 <| seq
            newCand  = SubSeq balance' seq' (i-2) j
        in leftLeft newCand (newCand ?: sols)

    rightRight cand@(SubSeq balance seq i j) sols
      | j+2 >= Vec.length arr = sols
      | otherwise =
        let v1 = arr Vec.! (j+1)
            v2 = arr Vec.! (j+2)
            balance' = newBalance balance v1 v2
            seq'     = seq |> v1 |> v2
            newCand  = SubSeq balance' seq' i (j+2)
        in rightRight newCand (newCand ?: sols)

    newBalance old n1 n2
      | n1 < med, n2 < med = old - 2
      | n1 > med, n2 > med = old + 2
      | otherwise          = old

infixr 5 ?:
--(?:) :: SubSeq -> [SubSeq] -> [SubSeq]
x@(SubSeq b _ _ _) ?: xs
  | b == 0    = x : xs
  | otherwise = xs

main :: IO ()
main = do
  [len, median] <- fmap read . words <$> getLine
  seq <- fmap read . words <$> getLine
  let solutions = balancedSubSeqs seq median
  print (length solutions)
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Open Kattis Problem Srednji: Hints to improve my algorithm

Viktor Dukhovni
In reply to this post by Viktor Dukhovni
On Wed, Sep 23, 2020 at 03:57:33AM -0400, Viktor Dukhovni wrote:

> The best algorithm that comes to mind runs in linear time in the length
> of the list, and requires linear (2N) additional space.  No sorting
> (that would not be linear) or complex testing of candidates is required,
> just some counting and O(N) book-keeping.

On my machine the constant factor seems to be about 0.7 seconds for a
randomly "desorted" list of length 10 million numbers, in which choosing
the desired median to be 5 million yields 9,979,641,307 possible
combinations of sequences:

    9979641307
       2,160,167,416 bytes allocated in the heap
             106,240 bytes copied during GC
          80,053,184 bytes maximum residency (2 sample(s))
             744,512 bytes maximum slop
                  80 MiB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       960 colls,     0 par    0.005s   0.005s     0.0000s    0.0001s
      Gen  1         2 colls,     0 par    0.005s   0.005s     0.0026s    0.0050s

      INIT    time    0.000s  (  0.000s elapsed)
      MUT     time    0.652s  (  0.698s elapsed)
      GC      time    0.010s  (  0.010s elapsed)
      EXIT    time    0.000s  (  0.000s elapsed)
      Total   time    0.663s  (  0.708s elapsed)

      %GC     time       0.0%  (0.0% elapsed)

      Alloc rate    3,311,613,717 bytes per MUT second

      Productivity  98.4% of total user, 98.5% of total elapsed

The tuned up algorithm uses 1*N+constant space, which for 10 million
64-bit Ints in an Unboxed Vector works out to the reported 80 MB.

Most of the CPU time (and heap allocaton) is likely spent reading and
converting the input stream of decimal integers.  The actual CPU time
spent solving the problem is likely a fraction of that cost.

The RTS stats for 100M numbers confirm the linear scaling in time and
space (this time 103,749,385,441 ways to place the median):

    103749385441
      21,701,903,208 bytes allocated in the heap
             935,496 bytes copied during GC
         800,053,240 bytes maximum residency (2 sample(s))
              67,592 bytes maximum slop
                 766 MiB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      9610 colls,     0 par    0.052s   0.052s     0.0000s    0.0001s
      Gen  1         2 colls,     0 par    0.049s   0.049s     0.0247s    0.0491s

      INIT    time    0.000s  (  0.000s elapsed)
      MUT     time    6.811s  (  7.297s elapsed)
      GC      time    0.101s  (  0.102s elapsed)
      EXIT    time    0.000s  (  0.000s elapsed)
      Total   time    6.912s  (  7.399s elapsed)

      %GC     time       0.0%  (0.0% elapsed)

      Alloc rate    3,186,237,035 bytes per MUT second

      Productivity  98.5% of total user, 98.6% of total elapsed

--
    Viktor.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Open Kattis Problem Srednji: Hints to improve my algorithm

Dominik Bollmann
In reply to this post by Brent Yorgey

Thank you for the input and hints, Viktor and Brent. I appreciate it!
I'll try to come up with a better algorithm.

Thanks!

Dominik


Brent Yorgey <[hidden email]> writes:

> Viktor is right.  Here's a small hint towards one way of solving it: start
> by replacing every number smaller than B by -1, and every number larger
> than B by 1 (and B itself by 0).
>
> -Brent
>
> On Wed, Sep 23, 2020 at 2:17 AM Dominik Bollmann <[hidden email]>
> wrote:
>
>>
>> Hi Haskell-Cafe,
>>
>> I've been trying to solve the Open Kattis Problem called Srednji
>> recently, unfortunately without success.
>>
>> Given a sequence A and a number B within that sequence this problem asks
>> to find all odd sub-sequences of A that, when sorted, have B as their
>> median in the middle. That is, from A we may remove some prefix and/or
>> suffix and if the resulting sub-sequence -- when sorted -- contains B in
>> the middle, then this sub-sequence is a solution. The problem asks to
>> find the number of all solutions. Check out
>> https://open.kattis.com/problems/srednji for the details.
>>
>> My Haskell solution below tries to find the number of odd sub-sequences
>> by first locating the median and then repeatedly moving left and right
>> from that median to find larger and larger sub-sequence candidates. Each
>> found candidate is checked to have B in the middle when sorted in order
>> to become a solution. Moreover, I also extend each such candidate
>> further to the left (and to the right, respectively) to determine
>> whether these leftward or rightward extensions are solutions, too.
>>
>> I think with this approach I systematically enumerate all solutions.
>> Unfortunately, though, this approach is too slow and times out on the
>> 11th hidden test cases.
>>
>> I'd therefore be thankful for hints about different approaches to
>> solving this problem more efficiently.
>>
>>
>> Thanks!
>>
>> Dominik.
>>
>> ====================================================================
>>
>> My current, slow Haskell code is this:
>>
>> import Data.Maybe
>> import Data.Sequence (Seq, (<|), (|>))
>> import qualified Data.Sequence as Seq
>> import qualified Data.Vector.Unboxed as Vec
>>
>> data SubSeq = SubSeq
>>   { getBalance :: {-# UNPACK #-} !Int
>>   , getSubSeq  :: Seq Int
>>   , from       :: {-# UNPACK #-} !Int
>>   , to         :: {-# UNPACK #-} !Int
>>   }
>>
>> balancedSubSeqs :: [Int] -> Int -> [SubSeq]
>> balancedSubSeqs seq med = do
>>   candidate <- leftRight [val0]
>>   let lefts  = leftLeft candidate []
>>       rights = rightRight candidate []
>>   candidate ?: lefts ++ rights
>>   where
>>     medidx = fromJust (Vec.findIndex (== med) arr)
>>     val0   = SubSeq 0 (Seq.singleton med) medidx medidx
>>     arr    = Vec.fromList seq
>>
>>     leftRight cands@(SubSeq balance seq i j : _)
>>       | i-1 < 0 || j+1 >= Vec.length arr = cands
>>       | otherwise =
>>         let v1       = arr Vec.! (i-1)
>>             v2       = arr Vec.! (j+1)
>>             balance' = newBalance balance v1 v2
>>             seq'     = (v1 <| seq) |> v2
>>         in leftRight (SubSeq balance' seq' (i-1) (j+1) : cands)
>>
>>     leftLeft cand@(SubSeq balance seq i j) sols
>>       | i-2 < 0   = sols
>>       | otherwise =
>>         let v1 = arr Vec.! (i-2)
>>             v2 = arr Vec.! (i-1)
>>             balance' = newBalance balance v1 v2
>>             seq'     = v1 <| v2 <| seq
>>             newCand  = SubSeq balance' seq' (i-2) j
>>         in leftLeft newCand (newCand ?: sols)
>>
>>     rightRight cand@(SubSeq balance seq i j) sols
>>       | j+2 >= Vec.length arr = sols
>>       | otherwise =
>>         let v1 = arr Vec.! (j+1)
>>             v2 = arr Vec.! (j+2)
>>             balance' = newBalance balance v1 v2
>>             seq'     = seq |> v1 |> v2
>>             newCand  = SubSeq balance' seq' i (j+2)
>>         in rightRight newCand (newCand ?: sols)
>>
>>     newBalance old n1 n2
>>       | n1 < med, n2 < med = old - 2
>>       | n1 > med, n2 > med = old + 2
>>       | otherwise          = old
>>
>> infixr 5 ?:
>> --(?:) :: SubSeq -> [SubSeq] -> [SubSeq]
>> x@(SubSeq b _ _ _) ?: xs
>>   | b == 0    = x : xs
>>   | otherwise = xs
>>
>> main :: IO ()
>> main = do
>>   [len, median] <- fmap read . words <$> getLine
>>   seq <- fmap read . words <$> getLine
>>   let solutions = balancedSubSeqs seq median
>>   print (length solutions)
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Open Kattis Problem Srednji: Hints to improve my algorithm

Viktor Dukhovni
On Thu, Sep 24, 2020 at 09:15:19PM +0200, Dominik Bollmann wrote:

> Thank you for the input and hints, Viktor and Brent. I appreciate it!
> I'll try to come up with a better algorithm.

Good luck.  Indeed once an efficient algorithm is implemented, the bulk
of the runtime is doing the I/O and deserialisation of the input values.
With `getContents` and `readInt` from Data.ByteString.Lazy.Char8 the
runtime for 100 million ints was ~6.8 seconds, while with `stdin` and
`readInt` from Data.ByteString.Streaming.stdin + it was ~25s, but a
more efficient `readInt` replacement for streaming ByteStrings brings
that down to 5.5s.

A loop in C using `scanf("%" PRIu64, &n)`, decodes 100M Ints in ~10s on
the same machine, which slower than the Haskell code do the same and
also solving this exercise, likely due to stdio(3) not being particularly
efficient, and scanf(3) having to reparse the format string on every
call.  In any case, this clearly points in the direction of reading and
converting the ASCII decimals as being the dominant cost in this
problem.

--
    Viktor.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.