poor performance when generating random text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

poor performance when generating random text

Dmitry V'yal
Hello anyone

I've written a snippet which generates a file full of random strings.
When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per
second which is on par with interpreted php. That's the fact I find
rather disappointing. Maybe I've missed something trivial? Any
suggestions and explanations are welcome. :)

% cat ext_sort.hs
import qualified Data.Text as T
import System.Random
import Control.Exception
import Control.Monad

import System.IO
import qualified Data.Text.IO as TI

gen_string g = let (len, g') = randomR (50, 450) g
                in T.unfoldrN len rand_text (len, g')
  where rand_text (0,_) = Nothing
        rand_text (k,g) = let (c, g') = randomR ('a','z') g
                          in Just (c, ((k-1), g'))

write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do
   let size = 100000
   sequence $ replicate size $ do
     g <- newStdGen
     let text = gen_string g
     TI.hPutStrLn h text

main = do
   putStrLn "generating text corpus"
   write_corpus "test.txt"



% cat ext_sort.prof
         Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final)

            ext_sort +RTS -p -RTS

         total time  =       32.56 secs   (32558 ticks @ 1000 us, 1
processor)
         total alloc = 12,742,917,332 bytes  (excludes profiling overheads)

COST CENTRE                MODULE  %time %alloc

gen_string.rand_text.(...) Main     70.7   69.8
gen_string                 Main     17.6   15.8
gen_string.rand_text       Main      5.4   13.3
write_corpus.\             Main      4.3    0.8


individual     inherited
COST CENTRE                       MODULE no.     entries  %time %alloc  
%time %alloc

MAIN MAIN                                67           0    0.0    0.0
100.0  100.0
  main                             Main 135           0    0.0    0.0  
100.0  100.0
   write_corpus                    Main 137           0    0.0    0.0  
100.0  100.0
    write_corpus.\                 Main 138           1    4.3    0.8  
100.0  100.0
     write_corpus.\.text           Main 140      100000    0.0    0.0    
95.7   99.2
      gen_string                   Main 141      100000   17.6   15.8    
95.7   99.2
       gen_string.g'               Main 147      100000    0.0    
0.0     0.0    0.0
       gen_string.rand_text        Main 144    25109743    5.4   13.3    
77.5   83.2
        gen_string.rand_text.g'    Main 148    24909743    0.6    
0.0     0.6    0.0
        gen_string.rand_text.(...) Main 146    25009743   70.7   69.8    
70.7   69.8
        gen_string.rand_text.c     Main 145    25009743    0.8    
0.0     0.8    0.0
       gen_string.len              Main 143      100000    0.0    
0.0     0.0    0.0
       gen_string.(...)            Main 142      100000    0.6    
0.3     0.6    0.3

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: poor performance when generating random text

Gregory Collins-3
System.Random is very slow. Try the mwc-random package from Hackage.

On Wed, Oct 17, 2012 at 9:07 AM, Dmitry Vyal <[hidden email]> wrote:
Hello anyone

I've written a snippet which generates a file full of random strings. When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per second which is on par with interpreted php. That's the fact I find rather disappointing. Maybe I've missed something trivial? Any suggestions and explanations are welcome. :)

% cat ext_sort.hs
import qualified Data.Text as T
import System.Random
import Control.Exception
import Control.Monad

import System.IO
import qualified Data.Text.IO as TI

gen_string g = let (len, g') = randomR (50, 450) g
               in T.unfoldrN len rand_text (len, g')
 where rand_text (0,_) = Nothing
       rand_text (k,g) = let (c, g') = randomR ('a','z') g
                         in Just (c, ((k-1), g'))

write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do
  let size = 100000
  sequence $ replicate size $ do
    g <- newStdGen
    let text = gen_string g
    TI.hPutStrLn h text

main = do
  putStrLn "generating text corpus"
  write_corpus "test.txt"



% cat ext_sort.prof
        Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final)

           ext_sort +RTS -p -RTS

        total time  =       32.56 secs   (32558 ticks @ 1000 us, 1 processor)
        total alloc = 12,742,917,332 bytes  (excludes profiling overheads)

COST CENTRE                MODULE  %time %alloc

gen_string.rand_text.(...) Main     70.7   69.8
gen_string                 Main     17.6   15.8
gen_string.rand_text       Main      5.4   13.3
write_corpus.\             Main      4.3    0.8


individual     inherited
COST CENTRE                       MODULE no.     entries  %time %alloc   %time %alloc

MAIN MAIN                                67           0    0.0    0.0 100.0  100.0
 main                             Main 135           0    0.0    0.0   100.0  100.0
  write_corpus                    Main 137           0    0.0    0.0   100.0  100.0
   write_corpus.\                 Main 138           1    4.3    0.8   100.0  100.0
    write_corpus.\.text           Main 140      100000    0.0    0.0    95.7   99.2
     gen_string                   Main 141      100000   17.6   15.8    95.7   99.2
      gen_string.g'               Main 147      100000    0.0    0.0     0.0    0.0
      gen_string.rand_text        Main 144    25109743    5.4   13.3    77.5   83.2
       gen_string.rand_text.g'    Main 148    24909743    0.6    0.0     0.6    0.0
       gen_string.rand_text.(...) Main 146    25009743   70.7   69.8    70.7   69.8
       gen_string.rand_text.c     Main 145    25009743    0.8    0.0     0.8    0.0
      gen_string.len              Main 143      100000    0.0    0.0     0.0    0.0
      gen_string.(...)            Main 142      100000    0.6    0.3     0.6    0.3

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Gregory Collins <[hidden email]>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: poor performance when generating random text

Alfredo Di Napoli
What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:

import qualified Data.Text as T
import System.Random.MWC
import Control.Monad
import System.IO
import Data.ByteString as B
import Data.Word (Word8)
import Data.ByteString.Char8 as CB


{- | Converts a Char to a Word8. Took from MissingH -}
c2w8 :: Char -> Word8
c2w8 = fromIntegral . fromEnum


charRangeStart :: Word8
charRangeStart = c2w8 'a'
{-# INLINE charRangeStart #-}

charRangeEnd :: Word8
charRangeEnd = c2w8 'z'
{-# INLINE charRangeEnd #-}

--genString :: Gen RealWorld -> IO B.ByteString
genString g = do
    randomLen <- uniformR (50 :: Int, 450 :: Int) g
    str <- replicateM randomLen $ uniformR (charRangeStart, charRangeEnd) g
    return $ B.pack str


writeCorpus :: FilePath -> IO [()]
writeCorpus file = withFile file WriteMode $ \h -> do
  let size = 100000
  _ <- withSystemRandom $ \gen ->
      replicateM size $ do
        text <- genString gen :: IO B.ByteString
        CB.hPutStrLn h text
  return [()]

main :: IO [()]
main =  writeCorpus "test.txt"



A.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: poor performance when generating random text

Dmitry V'yal
On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
> What about this? I've tested on my pc and seems pretty fast. The trick
> is to generate the gen only once. Not sure if the inlines helps, though:
>

 > What about this? I've tested on my pc and seems pretty fast. The
trick is to generate the gen only once. Not sure if the inlines helps,
though
...

Wow, haskell-cafe is a wonderful place! In just a two hours program run
time automagically improved 20x ;) Thanks Alfredo, code works wonderful.
Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop.
Interesting, how it compares to C now.

Inlining makes about 50x difference when code compiled without
optimization. A nice example.

Best wishes,
Dmitry


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: poor performance when generating random text

Alfredo Di Napoli
Glad to have been helpful :)

Bests,
Alfredo

Sent from my iPad

On 17/ott/2012, at 21:10, Dmitry Vyal <[hidden email]> wrote:

> On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
>> What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:
>>
>
> > What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though
> ...
>
> Wow, haskell-cafe is a wonderful place! In just a two hours program run time automagically improved 20x ;) Thanks Alfredo, code works wonderful. Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop. Interesting, how it compares to C now.
>
> Inlining makes about 50x difference when code compiled without optimization. A nice example.
>
> Best wishes,
> Dmitry
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe