Question about using the system encoding in Pipes.Prelude.Text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about using the system encoding in Pipes.Prelude.Text

Daniel Díaz Carrete
Hi,

In the documentation for Pipes.Prelude.Text, we find the following:
  • The line-based operations, like those in Data.Text.IO, use the system encoding (and T.hGetLineT.hPutLine etc.) and thus are slower than the 'official' route, which would use the very fast bytestring IO operations from Pipes.ByteString and the encoding and decoding functions in Pipes.Text.Encoding, which are also quite fast thanks to the streaming-commons package.
  • I'm curious: why is using the system encoding slower?

--
Reply | Threaded
Open this post in threaded view
|

Re: Question about using the system encoding in Pipes.Prelude.Text

Michael Thompson
I never looked into why, but you can observe that e.g. `fmap (decodeUtf8) . B.readFile` is several times as fast as `T.readFile` .  It's the same with the other material in `Data.Text.(Lazy.)IO`.  I think this is why he doesn't include the IO functions in `Data.Text`: the official IO is via ByteString using the encoding and decoding functions, same as with pipes-text


    import qualified Data.Text as T
    import qualified Data.Text.IO as T
    import qualified Data.Text.Encoding as T
    import qualified Data.ByteString.Char8 as B
    import System.Environment

    main = do
      x <- getArgs 
      case x of 
        [] -> do 
          txt <- T.readFile "txt/words3d.txt"
          print $ T.length txt
        _  ->  do
          bs <- B.readFile "txt/words3d.txt"
          print $ T.length (T.decodeUtf8 bs)

--