Re: [Haskell] reading binary files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] reading binary files

Ben Franksen-2
[questions such as this one should go to cafe]

On Wednesday 05 April 2006 20:41, minh thu wrote:

> 1/ i want to read some binary file (e.g. targa file format : *.tga).
>
> -- first way : via IOUArray
> showInfoHeader1 handle = do
>     a <- newArray_ (1,8) :: IO (IOUArray Int Word8)
>     hGetArray handle a 8
>     idLength <- readArray a 1 -- or getElems...
>     putStrLn ("id length : " ++ show idLength)
>     return ()
>
> -- second way : via c-like array
> showInfoHeader2 handle = do
>     b <- mallocArray 8 :: IO (Ptr Word8)
>     hGetBuf handle b 8
>     [idLength] <- peekArray 1 b -- or peakArray 8 b

The index should be 0 if you want to read the first byte. Also, if you
are only interested in the first byte, you could simply

    idLength <- peek b

or if it is not the first byte, then

    idLength <- peekByteOff b i

However, it is better to use arrays, than pointers.

>     putStrLn ("id length : " ++ show idLength)
>     free b
>     return ()
>
> so, briefly, i have to read some content into some kind of buffer
> (IOUArray Int Word8 or Ptr Word8), then get one (or more) elements
> from the buffor into a standard haskell variable (is it the correct
> word ?) (or list).
>
> in the second case, i also have to free the buffer.

Or use alloca or allocaBytes, which are both a lot faster than malloc
and free.

> in some case, when the data is more than one Word8 long, i have to
> 'reconstruct' it, i.e.:
>
> [x1,x2] <- getElems a

This will give you a run-time error, because you array is 8 elements
long, not 2. You can do

    x1:x2:_ <- ...

or better still

    x1 <- readArray b 1
    x2 <- readArray b 2

Still better: Use one of the available binary serialisation libraries.
They are already tuned for efficiency and give you a much nicer
high-level API.

> let x = fromIntegral x1 + fromIntegral x2 * 256 :: Int
>
> is it the correct way to read binary files ?

Depends on the byte order that is used in your file format. If it is
big-endian then correct, else not correct. (I hope I did get this
right; I always tend to confuse big- and little-endian.)

> 2/ haskell is (i heard that once ... :-) a high level language, so it
> has (must have) good support for abstraction...

Sure. See abve mentioned libraries.

> but in 1/, i have to choose between different kind of array
> representation (and i dont know which one is better) and it seems to
> me that the resulting code (compiled) would have to be the same.

I strongly recommend using some Array type (IOU or whatever). Ptr is
really just a raw pointer into memory: no protection from out-of-bounds
access, etc. much like in C. Ptr has been invented for interfacing with
C routines, not for regular Haskell programming.

HTH,
Ben
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] reading binary files

Dmitry V'yal
Hello, Bulat

I'm currently working on some kind of program for analysing FAT partitions.
Don't ask why did I chose to implement it in Haskell :)  Just for fun.
My program needs to read scattered chunks of binary data from a huge file and to
do a good amount of deserialisation.

I implemented basic functionality using Handles and Ptr's and now I'm starting
to regret for it. I have some pieces of code like:

type PtrAdvancer a b = StateT (Ptr a) IO b

peek_one :: Storable b => PtrAdvancer a b
peek_one = do
  p <- get
  res <- lift $ peek $ castPtr p
  put $ plusPtr p $ sizeOf res
  return res

peek_many :: Storable b => Int -> PtrAdvancer a [b]
peek_many 0 = return []
peek_many n = do
  first <- peek_one
  rest <- peek_many $ n-1
  return $ first:rest


data DirEntry = DirEntry
    { name :: String,
      attr :: Word8,
      crt_time_tenth :: Word8,
      crt_time :: Word16,
      crt_data :: Word16,
      lst_acc_data :: Word16,
      wrt_time :: Word16,
      wrt_date :: Word16,
      fst_cluster :: Word32,
      file_size :: Word32
    } deriving Show

instance Storable DirEntry where
    sizeOf _ = 32
    alignment _ = 32
    peek = evalStateT peek_dir_entry

peek_dir_entry = do
  n <- peek_many 11 :: PtrAdvancer a [Word8]
  at <- peek_one
  peek_one :: PtrAdvancer a Word8
  ctt <- peek_one
  ct <- peek_one
  cd <- peek_one
  lad <- peek_one
  fch <- peek_one :: PtrAdvancer a Word16
  wt <- peek_one
  wd <- peek_one
  fcl <- peek_one :: PtrAdvancer a Word16
  fs <- peek_one
  return $ DirEntry (words_to_str n) at ctt ct cd lad wt wd
             ((fromIntegral fch `shiftL` 16) + fromIntegral fcl) fs

or:

read_cluster_chain32 :: Handle -> FatAddress -> Cluster -> IO [Cluster]
read_cluster_chain32 h start cluster = do
  allocaBytes 4 $ \p -> chain32' p cluster True
  where
    chain32' p c need_seek = do
      when need_seek $ hSeek h AbsoluteSeek (fromIntegral $ start + c * 4)
      hGetBuf h p 4
      val <- peek p :: IO Word32
      let val28 = val .&. 0x0fffffff
      case val28 of
        0x0 -> return []
        0x0fffffff -> return [c]
        otherwise -> do rest <- if c+1 == val28 then chain32' p val28 False
                                 else  chain32' p val28 True
                        return $ c:rest

It works with a mediocre speed (about 10Mb/s when extracting files), but design
is ugly IMO. For example I need to write twice as much number of lines of
marshalling code compared to C. For data type declaration and then for Storable
instance. Is there a way to avoid it?

>
> with my lib, you can either read data directly from file (through
> implicit 512-byte buffer) or read whole file into the automatically
> allocated buffer with `readFromFile`. what is better - depends on what
> you plan to do with rest of file
>

Now I'm going to rewrite my code to make use of io library. So my question is
whether your library is well suited for such application (frequent positioning
and reading small pieces of data).

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: [Haskell] reading binary files

Bulat Ziganshin-2
Hello Dmitry,

Friday, April 7, 2006, 1:44:23 PM, you wrote:

> I'm currently working on some kind of program for analysing FAT partitions.

i prefer to answer you personally in the language that we both know
slightly better than English ;)

btw, i have plans to create "Russian Haskell team", what will spread
information about Haskell and it's usage in the Russian. web sites,
web forums, mail lists, russian-commented examples of Haskell code and
so on. i thinks that this will help to users that don't know English
too good and that don't have very complex questions, to get help
easier. this should help in spreading Haskell among wider amount of
programmers

i think the same will be great for other wide groups of users - Deutch,
Francais, Espanola and so on

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: [Haskell] reading binary files

Donald Bruce Stewart
In reply to this post by Dmitry V'yal
akamaus:
> Hello, Bulat
>
> I'm currently working on some kind of program for analysing FAT partitions.
> Don't ask why did I chose to implement it in Haskell :)  Just for fun.
> My program needs to read scattered chunks of binary data from a huge file and to
> do a good amount of deserialisation.

You might want to look at the various file system code that has
previously been implemented in Haskell:
    http://haskell.org/haskellwiki/Libraries_and_tools/Operating_system

Links to the various binary IO libs (or some of  them) are on the
data structures page.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe