I am trying to write some code to read flat files from a mainframe system.
This includes some character fields. This is a fixed width file, so each field will have a consistent length between records, but there are fields of different length within a record. For example, I might have a "name" field length 20 and an eye color field length 5. I am trying to use the binary library to read in this file. I've written a binary type, MFChar2, for reading in a 2-length character field. It is defined as such (you can safely ignore the ebcdicToAscii piece, it is just doing character conversion): data MFChar2 = MFChar2 [Word8] instance Binary MFChar2 where put = undefined get = do ebcdic <- replicateM 2 getWord8 return $ MFChar2 $ map ebcdicToAscii ebcdic What I would like to do is have some kind of generic "MFChar" data type which could take any character length, but I can't figure out how to do it. Any help would be appreciated. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/3349841f/attachment.htm |
On Tue, Nov 18, 2008 at 10:02:20AM -0800, Michael Snoyman wrote:
> I am trying to write some code to read flat files from a mainframe system. > This includes some character fields. This is a fixed width file, so each > field will have a consistent length between records, but there are fields of > different length within a record. For example, I might have a "name" field > length 20 and an eye color field length 5. > > I am trying to use the binary library to read in this file. I've written a > binary type, MFChar2, for reading in a 2-length character field. It is > defined as such (you can safely ignore the ebcdicToAscii piece, it is just > doing character conversion): > > data MFChar2 = MFChar2 [Word8] > instance Binary MFChar2 where > put = undefined > get = do ebcdic <- replicateM 2 getWord8 > return $ MFChar2 $ map ebcdicToAscii ebcdic > > What I would like to do is have some kind of generic "MFChar" data type > which could take any character length, but I can't figure out how to do it. > Any help would be appreciated. Hm, interesting! The problem is that 'get' does not take any arguments, so must determine what to do from the type at which it is called. So the number of words to be read needs to be in the type. We can't put actual Int values in a type -- but there is actually a way to do what you want, by encoding natural numbers at the type level! I don't know whether this really belongs on a 'beginners' list but I couldn't resist. =) data Z -- the type representing zero data S n -- the type representing the successor of another natural -- for example, Z, S Z, and S (S Z) are types representing -- zero, one, and two. -- the n is for a type-level natural representing the length of the list. data MFChar n = MFChar [Word8] -- add a Word8 to the beginning of an MFChar, resulting in an MFChar -- one word longer mfCons :: Word8 -> MFChar n -> MFChar (S n) mfCons w (MFChar ws) = MFChar (w:ws) instance Binary (MFChar Z) where get = return $ MFChar [] instance (Binary (MFChar n)) => Binary (MFChar (S n)) where get = do ebcdic <- getWord8 rest <- get -- the correct type of get is -- inferred due to the use of mfCons below return $ mfCons (ebcdicToAscii ebcdic) rest Now if you wanted to read a field with 20 chars, you can use get :: Get (MFChar (S (S (S ... 20 S's ... Z)))) Ugly, I know. You could make it slightly more bearable by defining some type synonyms at the top of your program like type Five = S (S (S (S (S Z)))) type Ten = S (S (S (S (S Five)))) and so on. Then you can just say get :: Get (MFChar Ten) or whatever. This is untested but it (or something close to it) ought to work. Of course, you may well ask yourself whether this contortion is really worth it. Maybe it is, maybe it isn't, but I can't think of a better way to do it in Haskell. In a dependently typed language such as Agda, we could just put regular old natural numbers in the types, instead of going through contortions to encode natural numbers as types as we have to do here. So I guess the real answer to your question is "use a dependently typed language". =) If you have problems getting this to work or more questions, feel free to ask! -Brent |
On Tue, Nov 18, 2008 at 11:18 AM, Brent Yorgey <[hidden email]>wrote:
> Hm, interesting! The problem is that 'get' does not take any > arguments, so must determine what to do from the type at which it is > called. So the number of words to be read needs to be in the type. > We can't put actual Int values in a type -- but there is actually a > way to do what you want, by encoding natural numbers at the type > level! I don't know whether this really belongs on a 'beginners' list > but I couldn't resist. =) > > > data Z -- the type representing zero > data S n -- the type representing the successor of another natural > > -- for example, Z, S Z, and S (S Z) are types representing > -- zero, one, and two. > > -- the n is for a type-level natural representing the length of the list. > data MFChar n = MFChar [Word8] > > -- add a Word8 to the beginning of an MFChar, resulting in an MFChar > -- one word longer > mfCons :: Word8 -> MFChar n -> MFChar (S n) > mfCons w (MFChar ws) = MFChar (w:ws) > > instance Binary (MFChar Z) where > get = return $ MFChar [] > > instance (Binary (MFChar n)) => Binary (MFChar (S n)) where > get = do ebcdic <- getWord8 > rest <- get -- the correct type of get is > -- inferred due to the use of mfCons below > return $ mfCons (ebcdicToAscii ebcdic) rest > > > Now if you wanted to read a field with 20 chars, you can use > > get :: Get (MFChar (S (S (S ... 20 S's ... Z)))) > > Ugly, I know. You could make it slightly more bearable by defining > some type synonyms at the top of your program like > > type Five = S (S (S (S (S Z)))) > type Ten = S (S (S (S (S Five)))) > > and so on. Then you can just say get :: Get (MFChar Ten) or whatever. > > This is untested but it (or something close to it) ought to work. Of > course, you may well ask yourself whether this contortion is really > worth it. Maybe it is, maybe it isn't, but I can't think of a better > way to do it in Haskell. In a dependently typed language such as > Agda, we could just put regular old natural numbers in the types, > instead of going through contortions to encode natural numbers as > types as we have to do here. So I guess the real answer to your > question is "use a dependently typed language". =) > > If you have problems getting this to work or more questions, feel free > to ask! > Very interesting solution to the problem. I tried it out and it works perfectly... but it's just too much of a hack for my tastes (no offense; I think it was very cool). I thought about it a bit and realized what I really want is a way to deal with tuples of the same type, which led to this kind of implementation. class RepTuple a b | a -> b where toList :: a -> [b] tMap :: (b -> b) -> a -> a instance RepTuple (a, a) a where toList (a, b) = [a, b] tMap f (a, b) = (f a, f b) And so on and so forth for every kind of tuple. Of course, this runs into the issue of the single case, for which I used the OneTuple library (actually, I wrote my own right now, but I intend to just use the OneTuple library). I can then do something like this (which I have tested and works): data MFChar w = MFChar w deriving Eq instance (RepTuple w a, Integral a) => Show (MFChar w) where show (MFChar ws) = map (chr . fromIntegral) $ toList ws instance (Integral a, Binary w, RepTuple w a) => Binary (MFChar w) where put = undefined get = do ebcdic <- get let ascii = tMap ebcdicToAscii ebcdic return $ MFChar ascii type MFChar1 = MFChar (OneTuple Word8) type MFChar2 = MFChar (Word8, Word8) type MFChar4 = MFChar (Word8, Word8, Word8, Word8) type MFChar5 = MFChar (Word8, Word8, Word8, Word8, Word8) type MFChar10 = MFChar (Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8) If I wanted, I could do away with the tMap function and just include the ebcdicToAscii step in the show instance. Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/8a75bca1/attachment-0001.htm |
On Tue, Nov 18, 2008 at 02:18:46PM -0800, Michael Snoyman wrote:
> > Very interesting solution to the problem. I tried it out and it works > perfectly... but it's just too much of a hack for my tastes (no offense; I None taken. =) > > class RepTuple a b | a -> b where > toList :: a -> [b] > tMap :: (b -> b) -> a -> a > > instance RepTuple (a, a) a where > toList (a, b) = [a, b] > tMap f (a, b) = (f a, f b) > > And so on and so forth for every kind of tuple. Of course, this runs into > the issue of the single case, for which I used the OneTuple library > (actually, I wrote my own right now, but I intend to just use the OneTuple > library). > This is reasonable too. It's just a tradeoff of hackishness vs. code length/tediousness. I.e. in the solution with type-level naturals, you don't need a separate instance like this for every number you're going to use. And no one really likes writing things like tMap f (a,b,c,d,e,g,h,i,j) = ... =) -Brent |
In reply to this post by Michael Snoyman
On 11/18/2008, "Michael Snoyman" <[hidden email]> wrote:
>I am trying to write some code to read flat files from a mainframe system. >This includes some character fields. This is a fixed width file, so each >field will have a consistent length between records, but there are fields of >different length within a record. For example, I might have a "name" field >length 20 and an eye color field length 5. > >I am trying to use the binary library to read in this file. I've written a >binary type, MFChar2, for reading in a 2-length character field. It is >defined as such (you can safely ignore the ebcdicToAscii piece, it is just >doing character conversion): > >data MFChar2 = MFChar2 [Word8] >instance Binary MFChar2 where > put = undefined > get = do ebcdic <- replicateM 2 getWord8 > return $ MFChar2 $ map ebcdicToAscii ebcdic > >What I would like to do is have some kind of generic "MFChar" data type >which could take any character length, but I can't figure out how to do it. >Any help would be appreciated. > >Thanks, >Michael Is this something that could be accomplished using template Haskell? I don't know anything about template Haskell, but maybe someone else on the list can comment on this suggestion. |
In reply to this post by Brent Yorgey-2
On Tue, Nov 18, 2008 at 5:16 PM, Brent Yorgey <[hidden email]>wrote:
> > class RepTuple a b | a -> b where > > toList :: a -> [b] > > tMap :: (b -> b) -> a -> a > > > > instance RepTuple (a, a) a where > > toList (a, b) = [a, b] > > tMap f (a, b) = (f a, f b) > > > > And so on and so forth for every kind of tuple. Of course, this runs into > > the issue of the single case, for which I used the OneTuple library > > (actually, I wrote my own right now, but I intend to just use the > OneTuple > > library). > > > > This is reasonable too. It's just a tradeoff of hackishness vs. code > length/tediousness. I.e. in the solution with type-level naturals, you > don't need a separate instance like this for every number you're going > to use. And no one really likes writing things like > tMap f (a,b,c,d,e,g,h,i,j) = ... =) > > -Brent > I agree that it's not something people want to code; I was just thinking that a RepTuple kind of library might be useful for other purposes (ensuring lists of certain length essentially). Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/f8e54b53/attachment.htm |
In reply to this post by David Frey
On Tue, Nov 18, 2008 at 07:58:24PM -0800, David Frey wrote:
> On 11/18/2008, "Michael Snoyman" <[hidden email]> wrote: > > >I am trying to write some code to read flat files from a mainframe system. > >This includes some character fields. This is a fixed width file, so each > >field will have a consistent length between records, but there are fields of > >different length within a record. For example, I might have a "name" field > >length 20 and an eye color field length 5. > > > >I am trying to use the binary library to read in this file. I've written a > >binary type, MFChar2, for reading in a 2-length character field. It is > >defined as such (you can safely ignore the ebcdicToAscii piece, it is just > >doing character conversion): > > > >data MFChar2 = MFChar2 [Word8] > >instance Binary MFChar2 where > > put = undefined > > get = do ebcdic <- replicateM 2 getWord8 > > return $ MFChar2 $ map ebcdicToAscii ebcdic > > > >What I would like to do is have some kind of generic "MFChar" data type > >which could take any character length, but I can't figure out how to do it. > >Any help would be appreciated. > > > >Thanks, > >Michael > > > Is this something that could be accomplished using template Haskell? I > don't know anything about template Haskell, but maybe someone else on > the list can comment on this suggestion. Yes, in the sense that you could use Template Haskell to generate all the repetitive boilerplate code that you'd rather not write yourself, e.g. one instance of Binary for each number of Words. In this particular case it seems sort of overkill, in the sense that it probably wouldn't be worth the pain of learning how to use it. (For those who don't know, Template Haskell [1] is a GHC feature + library that lets you do compile-time metaprogramming/code generation.) -Brent [1] http://www.haskell.org/haskellwiki/Template_Haskell |
Free forum by Nabble | Edit this page |