|
Binary: high performance, pure binary serialisation for Haskell ---------------------------------------------------------------------- The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage: tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html The 'binary' package provides efficient serialisation of Haskell values to and from lazy ByteStrings. ByteStrings constructed this way may then be written to disk, written to the network, or further processed (e.g. stored in memory directly, or compressed in memory with zlib or bzlib). Encoding and decoding are achieved by the functions: encode :: Binary a => a -> ByteString decode :: Binary a => ByteString -> a which mirror the read/show functions. Convenience functions for serialising to disk are also provided: encodeFile :: Binary a => FilePath -> a -> IO () decodeFile :: Binary a => FilePath -> IO a To serialise your Haskell data, all you need do is write an instance of Binary for your type. For example, suppose in an interpreter we had the data type: import Data.Binary import Control.Monad data Exp = IntE Int | OpE String Exp Exp We can serialise this to bytestring form with the following instance: instance Binary Exp where put (IntE i) = putWord8 0 >> put i put (OpE s e1 e2) = putWord8 1 >> put s >> put e1 >> put e2 get = do tag <- getWord8 case tag of 0 -> liftM IntE get 1 -> liftM3 OpE get get get The binary library has been heavily tuned for performance, particularly for writing speed. Throughput of up to 160M/s has been achieved in practice, and in general speed is on par or better than NewBinary, with the advantage of a pure interface. Efforts are underway to improve performance still further. Plans are also taking shape for a parser combinator library on top of binary, for bit parsing and foreign structure parsing (e.g. network protocols). Several projects are using binary already for serialisation: lambdabot : state file serialisation hmp3 : mp3 file database hpaste.org : pastes are stored in memory as compressed bytestrings, and serialised to disk on MACID checkpoints Binary was developed by a team of 8 during the Haskell Hackathon, Hac 07, and received 200+ commits over that period. You can see the commit graph here: http://www.cse.unsw.edu.au/~dons/images/commits/community/binary-commits.png The use of QuickCheck was critical to the rapid, safe development of the library. The API was developed in conjunction with the QuickCheck properties that checked the API for sanity. We were thus able to improve performance while maintaining stability. We feel that QuickCheck should be an integral part of the development strategy for all new Haskell libraries. Don't write code without it! Binary is portable, using the foreign function interface and cpp, and is tested with Hugs and GHC. Happy hacking! The Binary Strike Team, Lennart Kolmodin Duncan Coutts Don Stewart Spencer Janssen David Himmelstrup Bjorn Bringert Ross Paterson Einar Karttunen _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
dons:
> > Binary: high performance, pure binary serialisation for Haskell > ---------------------------------------------------------------------- > > The Binary Strike Team is pleased to announce the release of a new, > pure, efficient binary serialisation library for Haskell, now available > from Hackage: Ok, I forgot one point. It is possible to automatically derive instances of Binary for your custom types, if they inhabit Data and Typeable, using an SYB trick. Load tools/derive/BinaryDerive.hs into ghci, and bring your type into scope, then run: *Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks) To have the source for the Binary instance for the type Drinks derivied for you: *Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks) instance Binary Main.Drinks where put (Beer a) = putWord8 0 >> put a put Coffee = putWord8 1 put Tea = putWord8 2 put EnergyDrink = putWord8 3 put Water = putWord8 4 put Wine = putWord8 5 put Whisky = putWord8 6 get = do tag_ <- getWord8 case tag_ of 0 -> get >>= \a -> return (Beer a) 1 -> return Coffee 2 -> return Tea 3 -> return EnergyDrink 4 -> return Water 5 -> return Wine 6 -> return Whisky The use of SYB techniques to provide a 'deriving' script along with a new typeclass seems to be quite handy. -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
Yay! I knew if I waited long enough someone would write this.
Is the binary format portable? I need the produced files to work on both 32 and 64 bit architectures and with big and little endian machines. And of course, between different versions of a compiler or different compilers. John -- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
john:
> Yay! I knew if I waited long enough someone would write this. > > Is the binary format portable? I need the produced files to work on both > 32 and 64 bit architectures and with big and little endian machines. And > of course, between different versions of a compiler or different > compilers. We believe so, and its a bug if this is not the case. The src documents the encoding format used for each type (we were unable to attach haddocks to instances.. grr.) All data is encoded in Network order, and extended to 64 bits for word sized values (like Int). It should be possible to encode a structure with ghc on x86, and decode it on a sparc64 running hugs. -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by John Meacham
On Thu, Jan 25, 2007 at 07:11:55PM -0800, John Meacham wrote:
> Is the binary format portable? I need the produced files to work on both > 32 and 64 bit architectures and with big and little endian machines. And > of course, between different versions of a compiler or different > compilers. Sorry to reply to myself, looking at the code, I see that it is. however, Ints appear to be stored as 64 bits always, this seems like a mistake. The Haskell standard only specifies Ints must have at least 30 bits of precision so programs that rely on more than that are not portable anyway. Plus, it is unlikely that any compilers ever will have Ints > 32 bits, ghc does at the moment by accident of design and it is considered a misfeature that will be fixed at some point. It would be an ugly wart to be stuck with going forward... John -- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
john:
> On Thu, Jan 25, 2007 at 07:11:55PM -0800, John Meacham wrote: > > Is the binary format portable? I need the produced files to work on both > > 32 and 64 bit architectures and with big and little endian machines. And > > of course, between different versions of a compiler or different > > compilers. > > Sorry to reply to myself, looking at the code, I see that it is. > however, Ints appear to be stored as 64 bits always, this seems like a > mistake. The Haskell standard only specifies Ints must have at least 30 > bits of precision so programs that rely on more than that are not > portable anyway. Plus, it is unlikely that any compilers ever will have > Ints > 32 bits, ghc does at the moment by accident of design and it is > considered a misfeature that will be fixed at some point. It would be an > ugly wart to be stuck with going forward... This was perhaps the only issue of contention during development. It was felt that those wishing to serialise precisely would use an explicit word sized type, such as Int64 or Word32, and that having things work correctly on ghc/amd64 right now was critical. If the Int/Int64 issue is resolved in the future, we can revisit this. Its fairly painless to upgrade files from one version of a Binary instance to another too (you just read in using the old 'get' method, and write back out using the new 'put' method). -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
DrIFT 2.2.1 is out and now has support for the Data.Binary module.
The old 'Binary' has been moved to 'BitsBinary' and 'Binary' now refers to the new 'Data.Binary' version of the library. the homepage is at: http://repetae.net/~john/computer/haskell/DrIFT/ the current list of deriving rules it knows about is: Binary: Binary Data.Binary binary encoding of terms BitsBinary efficient binary encoding of terms GhcBinary byte sized binary encoding of terms Debugging: Arbitrary Derive reasonable Arbitrary for QuickCheck Observable HOOD observable General: NFData provides 'rnf' to reduce to normal form (deepSeq) Typeable derive Typeable for Dynamic Generics: FunctorM derive reasonable fmapM implementation HFoldable Strafunski hfoldr Monoid derive reasonable Data.Monoid implementation RMapM derive reasonable rmapM implementation Term Strafunski representation via Dynamic Prelude: Bounded Enum Eq Ord Read Show Representation: ATermConvertible encode terms in the ATerm format Haskell2Xml encode terms as XML (HaXml<=1.13) XmlContent encode terms as XML (HaXml>=1.14) Utility: Parse parse values back from standard 'Show' Query provide a QueryFoo class with 'is', 'has', 'from', and 'get' routines from provides fromFoo for each constructor get for label 'foo' provide foo_g to get it has hasfoo for record types is provides isFoo for each constructor test output raw data for testing un provides unFoo for unary constructors update for label 'foo' provides 'foo_u' to update it and foo_s to set it John -- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
Congratulations, guys! Fast serialisation is one of the things that comes up over and over again, so an easy-to-use fast solution is a great step forward.
(Credit too to earlier pioneers, notably Bulat.) Simon | -----Original Message----- | From: [hidden email] [mailto:[hidden email]] On Behalf Of Donald | Bruce Stewart | Sent: 26 January 2007 02:51 | To: [hidden email] | Cc: [hidden email] | Subject: [Haskell-cafe] ANNOUNCE: binary: high performance, pure binary serialisation | | | Binary: high performance, pure binary serialisation for Haskell | ---------------------------------------------------------------------- | | The Binary Strike Team is pleased to announce the release of a new, | pure, efficient binary serialisation library for Haskell, now available | from Hackage: | | tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 | darcs: darcs get http://darcs.haskell.org/binary | haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
Hello,
Donald Bruce Stewart wrote: > Ok, I forgot one point. It is possible to automatically derive instances > of Binary for your custom types, if they inhabit Data and Typeable, > using an SYB trick. Load tools/derive/BinaryDerive.hs into ghci, and > bring your type into scope, then run: > > *Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks) It would seem that one needs to rerun the script every time the type is changed. That would be unfortunate. Perhaps I could have a go at writing a template haskell function to derive those instances? I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic. Using DrIFT would probably automate the deriving just as well, but in my particular situation TH support is easier to maintain than DrIFT support. Greetings, Arie _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
On Jan 26, 2007, at 2:40 PM, Arie Peterson wrote: > Using DrIFT would probably automate the deriving just as well, but > in my > particular situation TH support is easier to maintain than DrIFT > support. May I ask why TH is easier to maintain than DrIFT? I'm not familiar with DrIFT. Why would I prefer one over the other? Thanks, Joel -- http://wagerlabs.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Arie Peterson
Quoth Arie Peterson, nevermore,
> I also fear that the existing script does not handle types with more than > 256 constructors correctly. While uncommon, those are not unrealistic. "256 constructors ought to be enough for anybody"? ;-) Seriously though, the thought of a type definition that heavyweight quite terrifies me. I would be interested to see if such a thing could be warranted and not more sensibly broken down into smaller (sets of) units. I like to think of types as being a bit like functions; and there is no way I would ever think about a function with 256+ parameters. For a start, my screen isn't wide enough for that kind of thing... But, well done to the people responsible for the binary stuff. It looks fab. D. -- Dougal Stanton <[hidden email]> <http://brokenhut.livejournal.com> Word attachments considered harmful. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Joel Reymont
Joel Reymont wrote:
> May I ask why TH is easier to maintain than DrIFT? > > I'm not familiar with DrIFT. The reason is personal, and very silly. I only use ghc, so TH is available automatically. Like you, I have never used DrIFT, so I would have to get to know it, and install it everywhere I want to compile my program. From a very quick look at the DrIFT homepage, installation might be nontrivial on a windows machine without some cygwin-like environment. At any rate, *for me* it's more work than using TH, because I'm familiar with the latter and already depend on its presence. > Why would I prefer one over the other? I wouldn't know. Please do not let my prejudice influence your preference :-). Greetings, Arie _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote: > > Binary: high performance, pure binary serialisation for Haskell > ---------------------------------------------------------------------- > > The Binary Strike Team is pleased to announce the release of a new, > pure, efficient binary serialisation library for Haskell, now available > from Hackage: > > tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 > darcs: darcs get http://darcs.haskell.org/binary > haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Dougal Stanton
On Fri, Jan 26, 2007 at 03:12:29PM +0000, Dougal Stanton wrote:
> Quoth Arie Peterson, nevermore, > > I also fear that the existing script does not handle types with more than > > 256 constructors correctly. While uncommon, those are not unrealistic. > > "256 constructors ought to be enough for anybody"? ;-) > > Seriously though, the thought of a type definition that heavyweight > quite terrifies me. Think about simple enumerations, eg. for keywords in a programming language: data Keyword = IF | THEN | ELSE | BEGIN | END ... http://www.cs.vu.nl/grammars/cobol/: Number of keywords: 420 Perhaps such examples could be treated differently, but I think it's better to have a more general solution and not have to assume unneccesary restrictions on user's datatypes. > I would be interested to see if such a thing could be warranted and > not more sensibly broken down into smaller (sets of) units. I think in the above example the most sensible thing is to have all the keywords in the same datatype. Best regards Tomasz _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Donald Bruce Stewart
On Fri, Jan 26, 2007 at 02:16:22PM +1100, Donald Bruce Stewart wrote:
> We believe so, and its a bug if this is not the case. > > The src documents the encoding format used for each type (we were unable > to attach haddocks to instances.. grr.) > > All data is encoded in Network order, and extended to 64 bits for word > sized values (like Int). It should be possible to encode a structure > with ghc on x86, and decode it on a sparc64 running hugs. Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency? Best regards Tomasz _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Henning Thielemann
On Fri, Jan 26, 2007 at 04:31:28PM +0100, Henning Thielemann wrote:
> On Fri, 26 Jan 2007, Donald Bruce Stewart wrote: > > > > > Binary: high performance, pure binary serialisation for Haskell > > ---------------------------------------------------------------------- > > > > The Binary Strike Team is pleased to announce the release of a new, > > pure, efficient binary serialisation library for Haskell, now available > > from Hackage: > > > > tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 > > darcs: darcs get http://darcs.haskell.org/binary > > haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html > > I want to write out data in the machine's endianess, because that data > will be post-processed by sox, which reads data in the machine's > endianess. Is this also planned for the package? I also have to use a specific serialisation format. I guess we could both simply use putWord8, but then we'll probably lose most of the benefits of using the library. Perhaps we could think about introducing some "encoding contexts", with a default encoding that can be automatically derived, but also with the ability to create one's own encodings? Best regards Tomasz _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Tomasz Zielonka
Tomasz Zielonka <[hidden email]> wrote:
> Did you consider using an encoding which uses variable number of > bytes? If yes, I would be interested to know your reason for not > choosing such an encoding. Efficiency? My Binary implementation (from 1998) used a type-specific number of bits to encode the constructor - exactly as many as needed. (If you were writing custom instances, you could even use a variable number of bits for the constructor, e.g. using Huffman encoding to make the more common constructors have the shortest representation.) The latter certainly imposes an extra time overhead on decoding, because you cannot just take a fixed-size chunk of bits and have the value. But I would have thought that in the regular case, using a type-specific (but not constructor-specific) size for representing the constructor would be very easy and have no time overhead at all. Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
existing ecoding system - both the BER (Basic Encoding Rules) and the
PER (Packed Encoding Rules). If you are looking to target a well supported standard - this would be the one. Neil On 26/01/07, Malcolm Wallace <[hidden email]> wrote: > Tomasz Zielonka <[hidden email]> wrote: > > > Did you consider using an encoding which uses variable number of > > bytes? If yes, I would be interested to know your reason for not > > choosing such an encoding. Efficiency? > > My Binary implementation (from 1998) used a type-specific number of bits > to encode the constructor - exactly as many as needed. (If you were > writing custom instances, you could even use a variable number of bits > for the constructor, e.g. using Huffman encoding to make the more common > constructors have the shortest representation.) > > The latter certainly imposes an extra time overhead on decoding, because > you cannot just take a fixed-size chunk of bits and have the value. But > I would have thought that in the regular case, using a type-specific > (but not constructor-specific) size for representing the constructor > would be very easy and have no time overhead at all. > > Regards, > Malcolm > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Tomasz Zielonka
On Fri, Jan 26, 2007 at 04:36:50PM +0100, Tomasz Zielonka wrote:
> Did you consider using an encoding which uses variable number of bytes? > If yes, I would be interested to know your reason for not choosing such > an encoding. Efficiency? I am testing/benchmarking one right now I wrote for 'Integer', so far, I think it may be better in time _and_ space! cache effects no doubt. A nice thing about it is that for the common case, short ascii strings, the serialized form takes up exactly as much as they would in C, very nice. :) John -- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Tomasz Zielonka
On Fri, Jan 26, 2007 at 04:42:48PM +0100, Tomasz Zielonka wrote:
> I also have to use a specific serialisation format. I guess we could > both simply use putWord8, but then we'll probably lose most of the > benefits of using the library. > > Perhaps we could think about introducing some "encoding contexts", with > a default encoding that can be automatically derived, but also with the > ability to create one's own encodings? one can use newtypes they would be faster in any case, I was thinking something like: > newtype XDRInt = XDRInt Int > newtype XDRSTring sz = ... and so forth, now if you build up a structure > data NfsSattr = NfsSattr { > mode :: XdrUnsigend, -- protection mode bits > uid :: XdrUnsigned, -- owner user id > gid :: XdrUnsigned, -- owner group id > size :: XdrUnsigned, -- file size in bytes > atime :: XdrNfsTime, -- time of last access > mtime :: XdrNfsTime -- time of last modification > } now you can speak nfs directly by serializing right from and to your socket! :) a whole filesystem implemented in haskell in not so many lines. very nice. actually, I probably will write Data.Binary.Protocol.Xdr (better location?). I actually do have a NFS server written in haskell in a much more clunky way I could revive. now, the only new primitives I would need are: > alignTo :: Word8 -> Int -> Put > alignTo _ _ = ... > > setAlignment :: Int -> Put > setAlignment _ = ... where alignTo would output some number of bytes in order to bring the stream to the next alignment boundry specified, and setAlignment would force the current alignment to be some value, without outputing any bytes. Would these be doable? They would open up a lot of possibilities. John -- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Powered by Nabble | Edit this page |
