...and again today I found myself trying to do something that would be
very easy in an imperative language, but I cannot think of a single good way of doing it in Haskell. Hopfully somebody can give me some hints. I'm writing a whole bunch of data compression programs. In particular, one of them was a Huffman encoder. The encoder scans the input file, computes symbol frequencies, and builds a canonical Huffman table, which it then dumps to file. The actual compressed data goes after that. The encoder consists of two functions. One takes the Huffman table and serializes it. The other does the actual encoding. The (++) operator joins the two together before going to file. Getting the data back is mildly more tricky. There's a decoder function that takes the complete data stream, parses out the Huffman table, and returns a pair containing the Huffman table and the remaining data. Then a second decoder function actually does the Huffman decoding on this data. So far, it's all pretty easy. The size of the Huffman table determins how many bytes the decoder needs to read, so it's easy to figure out where the table ends. And then I had a brainwave. I altered the encoder so that the output from the Huffman table serialize function codes through a simple RLE encoder before going to file. Then a went and altered the decoder to... uh... oops. Now I have a problem. It's easy enough to pass the entire data stream through an RLE decoder and feed that to the Huffman table deserialize function, and it will give be back the table. But I now have *no clue* where the table ends in the original stream! If this was Java, you would write all these compression and decompression stages as stream wrappers. So you wrap the raw input stream with an RLE decoder, have the function read the Huffman table, and then take off the RLE decoder and process the rest of the stream. However, in Haskell, the reading and deserializing of the Huffman table doesn't so anything detectable to the original data stream, so you can't tell how much of it the RLE decoder actually processed. Um... help! (What I did in the end was to write the size of the RLE encoded Huffman table to the file and read it back in the decoder. But this shouldn't actually be necessary...) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Andrew Coppin wrote,
> If this was Java, you would write all these compression and > decompression stages as stream wrappers. So you wrap the raw input > stream with an RLE decoder, have the function read the Huffman table, > and then take off the RLE decoder and process the rest of the stream. Except that if the RLE decoding stream wrapper contains any internal buffering, then stripping it off would very likely result in data loss. What you actually have to do is have the RLE decoding stream wrapper build and return you a stream wrapper which delivers the remainder of the stream. Which I think shows that the abstraction isn't leaky ... where "the remainder" starts is very much dependent on the precise encoding of the prefix of the stream. Cheers, Miles _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Miles Sabin wrote:
> Andrew Coppin wrote, > >> If this was Java, you would write all these compression and >> decompression stages as stream wrappers. So you wrap the raw input >> stream with an RLE decoder, have the function read the Huffman table, >> and then take off the RLE decoder and process the rest of the stream. >> > > Except that if the RLE decoding stream wrapper contains any internal > buffering, then stripping it off would very likely result in data loss. > ...which is why you design your RLE decoding wrapper to not buffer anything. ;-) You'd have more fun with bit-oriented things. But then, you'd design an object representing a stream of bits instead of a stream of bytes/characters/whatever, and it'd be fine. > What you actually have to do is have the RLE decoding stream wrapper > build and return you a stream wrapper which delivers the remainder of > the stream. > > Which I think shows that the abstraction isn't leaky ... where "the > remainder" starts is very much dependent on the precise encoding of the > prefix of the stream. > It's just that at present, both the Huffman table load and save functions have no idea that they're operating through an RLE layer. Unfortunately, to be able to get the rest of the data, I have to expose to those functions the fact that there's this extra layer happening. (And then what if I want to use something else instead of RLE? What if I want to put *several* layers in? It all gets very messy...) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Fri, Jun 29, 2007 at 07:39:28PM +0100, Andrew Coppin wrote:
> Now I have a problem. It's easy enough to pass the entire data stream > through an RLE decoder and feed that to the Huffman table deserialize > function, and it will give be back the table. But I now have *no clue* > where the table ends in the original stream! Sounds to me like you want a parsing monad. Generally, when you want state, you want a monad, and the field of parsing monads is pretty mature. You can either write up a monad of your own, or use one of the existing ones (parsec, frisby, read). -- David Roundy Department of Physics Oregon State University _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Friday 29 June 2007, Andrew Coppin wrote:
> ...and again today I found myself trying to do something that would be > very easy in an imperative language, but I cannot think of a single good > way of doing it in Haskell. Hopfully somebody can give me some hints. > <snip long and helpful explanation> Here's my solution (drawn from a library I'll be posting Real Soon Now): import Control.Monad import Control.Monad.Trans data SPMT iota omicron m alpha = ReturnSP alpha | LiftSP (m (SPMT iota omicron m alpha)) | GetSP (iota -> SPMT iota omicron m alpha)) | PutSP omicron (SPMT iota omicron m alpha) instance Monad m => Monad (SPMT iota omicron m) where return x = ReturnSP x ReturnSP x >>= f = f x LiftSP a >>= f = LiftSP (liftM (>>= f) a) GetSP a >>= f = GetSP (\ x -> a x >>= f) PutSP x a >>= f = PutSP x (a >>= f) instance MonadTrans (SPMT iota omicron) where lift a = LiftSP (liftM ReturnSP a) getSP :: SPMT iota omicron m iota getSP = GetSP ReturnSP putSP :: omicron -> SPMT iota omicron m () putSP x = PutSP x (ReturnSP ()) (^>^) :: Monad m => SPMT iota omicron m alpha -> SPMT omicron omicron' m beta -> SPMT iota omicron' m beta a ^>^ ReturnSP x = ReturnSP x a ^>^ LiftSP b = LiftSP (liftM (a ^>^) b) a ^>^ PutSP x b = PutSP x (a ^>^ b) LiftSP a ^>^ GetSP b = LiftSP (liftM (^>^ GetSP b) a) GetSP a ^>^ GetSP b = GetSP (\ x -> a x ^>^ GetSP b) PutSP x a ^>^ GetSP b = a ^>^ b x If the signature of SPMT suffices to write decodeRLE and decodeHeader, the task of applying RLE decoding just to the header can be implemented by using decodeRLE ^>^ decodeHeader in place of just decodeHeader. Extension to situations left un-implemented above I leave for your ingenuity and/or release of my library. HTH. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Friday 29 June 2007, Jon Cast wrote:
> Here's my solution (drawn from a library I'll be posting Real Soon Now): <snip solution> I forgot to point out that this is 75-90% drawn from a library called Fudgets[1], which is probably the most extended practical meditation to date on programming with lazy streams in Haskell. Embedding that approach in a monadic interface seems to be my own idea, though. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs [1] http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by David Roundy-2
David Roundy wrote:
> On Fri, Jun 29, 2007 at 07:39:28PM +0100, Andrew Coppin wrote: > >> Now I have a problem. It's easy enough to pass the entire data stream >> through an RLE decoder and feed that to the Huffman table deserialize >> function, and it will give be back the table. But I now have *no clue* >> where the table ends in the original stream! >> > > Sounds to me like you want a parsing monad. Generally, when you want > state, you want a monad, and the field of parsing monads is pretty mature. > You can either write up a monad of your own, or use one of the existing > ones (parsec, frisby, read). > Perhaps. But how do you run a parser on top of another parser? More importantly, how do you stack several parsers one on top of the other, get the top-most one to return the thing it parsed, and then make a completely different stack of parsers process the remainder? I'm sure it can be done, but... I'm having trouble wrapping my mind around that at this time of night... :-S _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
Hello Andrew,
Friday, June 29, 2007, 10:39:28 PM, you wrote: > I'm writing a whole bunch of data compression programs. me too :) but i never used Haskell for compression itself, only for managing archives. fast compression routines are written in C++ http://www.haskell.org/bz -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Bulat Ziganshin wrote:
> Hello Andrew, > > Friday, June 29, 2007, 10:39:28 PM, you wrote: > > >> I'm writing a whole bunch of data compression programs. >> > > me too :) but i never used Haskell for compression itself, only for > managing archives. fast compression routines are written in C++ > What, you're telling me that "fast" software cannot be written in Haskell? :-P Well anyway, speed is not my aim. My aim is to play with various textbook algorithms an examine how well they work for various types of data. As long as the code isn't *absurdly* slow that'll be just fine. (I forget who it was, but a while back somebody pointed out the wisdom of using Data.Map. It certainly makes the LZW implementation about 400% faster! To say nothing of Huffman...) BTW, how do the pros do Huffman coding? Presumably not by traversing trees of pointer-linked nodes to process things 1 bit at a time... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by David Roundy-2
David Roundy wrote:
> On Fri, Jun 29, 2007 at 07:39:28PM +0100, Andrew Coppin wrote: >> Now I have a problem. It's easy enough to pass the entire data stream >> through an RLE decoder and feed that to the Huffman table deserialize >> function, and it will give be back the table. But I now have *no clue* >> where the table ends in the original stream! > > Sounds to me like you want a parsing monad. Generally, when you want > state, you want a monad, and the field of parsing monads is pretty mature. > You can either write up a monad of your own, or use one of the existing > ones (parsec, frisby, read). Am I missing something or why wouldn't encode, decode :: String -> String encode = encodeRLE . encodeHuffman decode = decodeHuffman . decodeRLE do the job? This is probably what Andrew intends to do in his Java version. Note that this not only RLE-encodes the Huffman table but also (needlessly) the data stream. In case you only want to RLE the table, a simple Word32 field tracking the size of the Huffman table should be enough. Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
Hello Andrew,
Saturday, June 30, 2007, 11:48:19 AM, you wrote: >> me too :) but i never used Haskell for compression itself, only for >> managing archives. fast compression routines are written in C++ >> > What, you're telling me that "fast" software cannot be written in > Haskell? :-P in my program, managing archives isn't time-critical part. compression requires 90-99% of total execution time > Well anyway, speed is not my aim. My aim is to play with various > textbook algorithms an examine how well they work for various types of > data. As long as the code isn't *absurdly* slow that'll be just fine. yes, for studying algorithms Haskell is perfect tool > (I forget who it was, but a while back somebody pointed out the wisdom > of using Data.Map. It certainly makes the LZW implementation about 400% > faster! To say nothing of Huffman...) it seems that you've done first implementation with lists or immutable arrays? :) > BTW, how do the pros do Huffman coding? Presumably not by traversing > trees of pointer-linked nodes to process things 1 bit at a time... encoding is simple - make this traversal one time and store bit sequence for each symbol. for decoding, you can find length of longest symbol MaxBits and build table of 2^MaxBits elements which allows to find next symbol by direct lookup with next input MaxBits. faster algorithms use two-level lookup, first lookup with 9-10 bits is best when your encoded symbols are bytes -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Bulat Ziganshin wrote:
> Hello Andrew, > > Saturday, June 30, 2007, 11:48:19 AM, you wrote: > >> Well anyway, speed is not my aim. My aim is to play with various >> textbook algorithms an examine how well they work for various types of >> data. As long as the code isn't *absurdly* slow that'll be just fine. >> > > yes, for studying algorithms Haskell is perfect tool > > >> (I forget who it was, but a while back somebody pointed out the wisdom >> of using Data.Map. It certainly makes the LZW implementation about 400% >> faster! To say nothing of Huffman...) >> > > it seems that you've done first implementation with lists or immutable > arrays? :) > Lists. In fact, I *believe* the code is still on the Wiki somewhere... >> BTW, how do the pros do Huffman coding? Presumably not by traversing >> trees of pointer-linked nodes to process things 1 bit at a time... >> > > encoding is simple - make this traversal one time and store bit > sequence for each symbol. for decoding, you can find length of longest > symbol MaxBits and build table of 2^MaxBits elements which allows to > find next symbol by direct lookup with next input MaxBits. faster > algorithms use two-level lookup, first lookup with 9-10 bits is best > when your encoded symbols are bytes > I see. So build a table of codes and bitmasks and test against that... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Heinrich Apfelmus
apfelmus wrote:
> Am I missing something or why wouldn't > > encode, decode :: String -> String > encode = encodeRLE . encodeHuffman > decode = decodeHuffman . decodeRLE > > do the job? This is probably what Andrew intends to do in his Java > version. Note that this not only RLE-encodes the Huffman table but also > (needlessly) the data stream. In case you only want to RLE the table, a > simple Word32 field tracking the size of the Huffman table should be enough. > It is enough. But given that the whole purpose of compression algorithms is to squeeze data into the tiniest possible space, I wanted to avoid having a size field. And mathematically it's perfectly possible to do... I just can't find a convinient way to do it in Haskell. :-( _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Andrew Coppin wrote:
> apfelmus wrote: >> Am I missing something or why wouldn't >> >> encode, decode :: String -> String >> encode = encodeRLE . encodeHuffman >> decode = decodeHuffman . decodeRLE >> >> do the job? This is probably what Andrew intends to do in his Java >> version. Note that this not only RLE-encodes the Huffman table but also >> (needlessly) the data stream. In case you only want to RLE the table, a >> simple Word32 field tracking the size of the Huffman table should be >> enough. >> > > It is enough. But given that the whole purpose of compression algorithms > is to squeeze data into the tiniest possible space, I wanted to avoid > having a size field. And mathematically it's perfectly possible to do... > I just can't find a convinient way to do it in Haskell. :-( Well, those 4 bytes won't kill you. But you can of course stop RLE-decoding as soon as this has read as many bytes as there are in the Huffman table. A systematic way to do this are parser combinators. Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
apfelmus wrote:
> Andrew Coppin wrote: > >> >> It is enough. But given that the whole purpose of compression algorithms >> is to squeeze data into the tiniest possible space, I wanted to avoid >> having a size field. And mathematically it's perfectly possible to do... >> I just can't find a convinient way to do it in Haskell. :-( >> > > Well, those 4 bytes won't kill you. But you can of course stop > RLE-decoding as soon as this has read as many bytes as there are in the > Huffman table. A systematic way to do this are parser combinators. > Yeah... I'm fuzzy on how to do this. I can write parsers to do the various stages, and I can run one parser on top of another. But how to you swap whole "stacks" of parsers when the top-most one reaches a given stage? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
OK, well I don't know the Parsec types and function names off the top of
my head, but suppose I have the following: runParser :: Parser a b -> [a] -> Either ParseError b parseHuffmanTable :: [x] -> Parser Word8 (HuffmanTable x) parseHuffmanPayload :: HuffmanTable x -> Parser Word8 [x] parseRLE :: Parser Word8 [Word8] Now, if I want to just do without the RLE bit, I can do parseHuffman :: [x] -> Parser Word8 x parseHuffman xs = do table <- parseHuffmanTable xs parseHuffmanPayload table But if I want to add an RLE layer to just the Huffman table... erm... OK, I'm stuck now. :-S 1. How do I run the input through parseRLE and *then* through parseHuffmanTable? 2. How do I get parseHuffmanPayload to continue from where parseRLE left off? (How do I get parseRLE to not parse the entire input, for that matter...) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Andrew Coppin wrote:
> OK, well I don't know the Parsec types and function names off the top of > my head, but suppose I have the following: > > runParser :: Parser a b -> [a] -> Either ParseError b > > parseHuffmanTable :: [x] -> Parser Word8 (HuffmanTable x) > > parseHuffmanPayload :: HuffmanTable x -> Parser Word8 [x] > > parseRLE :: Parser Word8 [Word8] > > Now, if I want to just do without the RLE bit, I can do > > parseHuffman :: [x] -> Parser Word8 x > parseHuffman xs = do > table <- parseHuffmanTable xs > parseHuffmanPayload table > > But if I want to add an RLE layer to just the Huffman table... erm... > OK, I'm stuck now. :-S > > 1. How do I run the input through parseRLE and *then* through > parseHuffmanTable? > > 2. How do I get parseHuffmanPayload to continue from where parseRLE left > off? (How do I get parseRLE to not parse the entire input, for that > matter...) Well, there's no way to do that with a monolithic parseRLE since it will parse the input to the bitter end. But assuming that parseRLE = concat `liftM` many parseRLEBlock parseRLEBlock :: Parser Word8 [Word8] you can write an RLE parser that repeatedly parses RLE blocks until it has read equal or more than n Word8s parseRLEAmount n | n <= 0 = return [] | otherwise = do xs <- parseRLEBlock xss <- parseRLEAmount (n - length xs) return (xs ++ xss) To parse a huffman table, run the actual parser on the result of parseRLEAmount: parseHuffmanHeader = runParser parseHuffmanTable `liftM` parseRLEAmount (2^8) Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
apfelmus wrote:
> Andrew Coppin wrote: > >> OK, I'm stuck now. :-S >> >> 1. How do I run the input through parseRLE and *then* through >> parseHuffmanTable? >> >> 2. How do I get parseHuffmanPayload to continue from where parseRLE left >> off? (How do I get parseRLE to not parse the entire input, for that >> matter...) >> > > Well, there's no way to do that with a monolithic parseRLE since it will > parse the input to the bitter end. But assuming that > > parseRLE = concat `liftM` many parseRLEBlock > > parseRLEBlock :: Parser Word8 [Word8] > > you can write an RLE parser that repeatedly parses RLE blocks until it > has read equal or more than n Word8s > > parseRLEAmount n > | n <= 0 = return [] > | otherwise = do > xs <- parseRLEBlock > xss <- parseRLEAmount (n - length xs) > return (xs ++ xss) > > To parse a huffman table, run the actual parser on the result of > parseRLEAmount: > > parseHuffmanHeader = > runParser parseHuffmanTable `liftM` parseRLEAmount (2^8) > The use of lifeM is *inspired* - I would never have thought of writing it infix! Still, while this does solve the issue at hand, I'm still left wondering whether there's any way to make this work for some arbitrary parser. parseHuffmanTable always uses exactly N bytes of input, but what if it didn't? What if it was impossible to tell how many bytes it should read without actually running it? How would you implement that? Nice a since general solution exist? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
Hello Andrew,
Sunday, July 1, 2007, 1:18:16 PM, you wrote: >> encoding is simple - make this traversal one time and store bit >> sequence for each symbol. for decoding, you can find length of longest >> symbol MaxBits and build table of 2^MaxBits elements which allows to >> find next symbol by direct lookup with next input MaxBits. faster >> algorithms use two-level lookup, first lookup with 9-10 bits is best >> when your encoded symbols are bytes >> > I see. So build a table of codes and bitmasks and test against that... decodeSymbol = do n <- returnNextNBits MaxBits -- this operation doesn't forward input pointer! symbol <- table1 ! n bits <- table2 ! symbol skipNBits bits return symbol -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Bulat Ziganshin wrote:
> Hello Andrew, > > >> I see. So build a table of codes and bitmasks and test against that... >> > > decodeSymbol = do > n <- returnNextNBits MaxBits -- this operation doesn't forward input pointer! > symbol <- table1 ! n > bits <- table2 ! symbol > skipNBits bits > return symbol > I see. While we're on the subject... am I the first person to notice that Haskell doesn't appear to have much support for fiddling with streams of bits? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |