Type polymorphism with size

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Michael Snoyman
I am trying to write some code to read flat files from a mainframe system.
This includes some character fields. This is a fixed width file, so each
field will have a consistent length between records, but there are fields of
different length within a record. For example, I might have a "name" field
length 20 and an eye color field length 5.

I am trying to use the binary library to read in this file. I've written a
binary type, MFChar2, for reading in a 2-length character field. It is
defined as such (you can safely ignore the ebcdicToAscii piece, it is just
doing character conversion):

data MFChar2 = MFChar2 [Word8]
instance Binary MFChar2 where
    put = undefined
    get = do ebcdic <- replicateM 2 getWord8
             return $ MFChar2 $ map ebcdicToAscii ebcdic

What I would like to do is have some kind of generic "MFChar" data type
which could take any character length, but I can't figure out how to do it.
Any help would be appreciated.

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/3349841f/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Brent Yorgey-2
On Tue, Nov 18, 2008 at 10:02:20AM -0800, Michael Snoyman wrote:

> I am trying to write some code to read flat files from a mainframe system.
> This includes some character fields. This is a fixed width file, so each
> field will have a consistent length between records, but there are fields of
> different length within a record. For example, I might have a "name" field
> length 20 and an eye color field length 5.
>
> I am trying to use the binary library to read in this file. I've written a
> binary type, MFChar2, for reading in a 2-length character field. It is
> defined as such (you can safely ignore the ebcdicToAscii piece, it is just
> doing character conversion):
>
> data MFChar2 = MFChar2 [Word8]
> instance Binary MFChar2 where
>     put = undefined
>     get = do ebcdic <- replicateM 2 getWord8
>              return $ MFChar2 $ map ebcdicToAscii ebcdic
>
> What I would like to do is have some kind of generic "MFChar" data type
> which could take any character length, but I can't figure out how to do it.
> Any help would be appreciated.

Hm, interesting!  The problem is that 'get' does not take any
arguments, so must determine what to do from the type at which it is
called.  So the number of words to be read needs to be in the type.
We can't put actual Int values in a type -- but there is actually a
way to do what you want, by encoding natural numbers at the type
level!  I don't know whether this really belongs on a 'beginners' list
but I couldn't resist. =)


data Z      -- the type representing zero
data S n    -- the type representing the successor of another natural

-- for example,  Z,  S Z,  and S (S Z)  are types representing
--   zero, one, and two.

-- the n is for a type-level natural representing the length of the list.
data MFChar n = MFChar [Word8]

-- add a Word8 to the beginning of an MFChar, resulting in an MFChar
--   one word longer
mfCons :: Word8 -> MFChar n -> MFChar (S n)
mfCons w (MFChar ws) = MFChar (w:ws)

instance Binary (MFChar Z) where
  get = return $ MFChar []

instance (Binary (MFChar n)) => Binary (MFChar (S n)) where
  get = do ebcdic <- getWord8
           rest   <- get    -- the correct type of get is
                            -- inferred due to the use of mfCons below
           return $ mfCons (ebcdicToAscii ebcdic) rest


Now if you wanted to read a field with 20 chars, you can use

  get :: Get (MFChar (S (S (S ... 20 S's ... Z))))

Ugly, I know. You could make it slightly more bearable by defining
some type synonyms at the top of your program like

  type Five = S (S (S (S (S Z))))
  type Ten = S (S (S (S (S Five))))

and so on.  Then you can just say   get :: Get (MFChar Ten) or whatever.

This is untested but it (or something close to it) ought to work.  Of
course, you may well ask yourself whether this contortion is really
worth it.  Maybe it is, maybe it isn't, but I can't think of a better
way to do it in Haskell.  In a dependently typed language such as
Agda, we could just put regular old natural numbers in the types,
instead of going through contortions to encode natural numbers as
types as we have to do here.  So I guess the real answer to your
question is "use a dependently typed language". =)

If you have problems getting this to work or more questions, feel free
to ask!

-Brent
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Michael Snoyman
On Tue, Nov 18, 2008 at 11:18 AM, Brent Yorgey <[hidden email]>wrote:

> Hm, interesting!  The problem is that 'get' does not take any
> arguments, so must determine what to do from the type at which it is
> called.  So the number of words to be read needs to be in the type.
> We can't put actual Int values in a type -- but there is actually a
> way to do what you want, by encoding natural numbers at the type
> level!  I don't know whether this really belongs on a 'beginners' list
> but I couldn't resist. =)
>
>
> data Z      -- the type representing zero
> data S n    -- the type representing the successor of another natural
>
> -- for example,  Z,  S Z,  and S (S Z)  are types representing
> --   zero, one, and two.
>
> -- the n is for a type-level natural representing the length of the list.
> data MFChar n = MFChar [Word8]
>
> -- add a Word8 to the beginning of an MFChar, resulting in an MFChar
> --   one word longer
> mfCons :: Word8 -> MFChar n -> MFChar (S n)
> mfCons w (MFChar ws) = MFChar (w:ws)
>
> instance Binary (MFChar Z) where
>  get = return $ MFChar []
>
> instance (Binary (MFChar n)) => Binary (MFChar (S n)) where
>  get = do ebcdic <- getWord8
>           rest   <- get    -- the correct type of get is
>                            -- inferred due to the use of mfCons below
>           return $ mfCons (ebcdicToAscii ebcdic) rest
>
>
> Now if you wanted to read a field with 20 chars, you can use
>
>  get :: Get (MFChar (S (S (S ... 20 S's ... Z))))
>
> Ugly, I know. You could make it slightly more bearable by defining
> some type synonyms at the top of your program like
>
>  type Five = S (S (S (S (S Z))))
>  type Ten = S (S (S (S (S Five))))
>
> and so on.  Then you can just say   get :: Get (MFChar Ten) or whatever.
>
> This is untested but it (or something close to it) ought to work.  Of
> course, you may well ask yourself whether this contortion is really
> worth it.  Maybe it is, maybe it isn't, but I can't think of a better
> way to do it in Haskell.  In a dependently typed language such as
> Agda, we could just put regular old natural numbers in the types,
> instead of going through contortions to encode natural numbers as
> types as we have to do here.  So I guess the real answer to your
> question is "use a dependently typed language". =)
>
> If you have problems getting this to work or more questions, feel free
> to ask!
>

Very interesting solution to the problem. I tried it out and it works
perfectly... but it's just too much of a hack for my tastes (no offense; I
think it was very cool). I thought about it a bit and realized what I really
want is a way to deal with tuples of the same type, which led to this kind
of implementation.

class RepTuple a b | a -> b where
    toList :: a -> [b]
    tMap :: (b -> b) -> a -> a

instance RepTuple (a, a) a where
    toList (a, b) = [a, b]
    tMap f (a, b) = (f a, f b)

And so on and so forth for every kind of tuple. Of course, this runs into
the issue of the single case, for which I used the OneTuple library
(actually, I wrote my own right now, but I intend to just use the OneTuple
library).

I can then do something like this (which I have tested and works):

data MFChar w = MFChar w
    deriving Eq
instance (RepTuple w a, Integral a) => Show (MFChar w) where
    show (MFChar ws) = map (chr . fromIntegral) $ toList ws
instance (Integral a, Binary w, RepTuple w a) => Binary (MFChar w) where
    put = undefined
    get = do ebcdic <- get
             let ascii = tMap ebcdicToAscii ebcdic
             return $ MFChar ascii

type MFChar1 = MFChar (OneTuple Word8)
type MFChar2 = MFChar (Word8, Word8)
type MFChar4 = MFChar (Word8, Word8, Word8, Word8)
type MFChar5 = MFChar (Word8, Word8, Word8, Word8, Word8)
type MFChar10 = MFChar (Word8, Word8, Word8, Word8, Word8,
                        Word8, Word8, Word8, Word8, Word8)

If I wanted, I could do away with the tMap function and just include the
ebcdicToAscii step in the show instance.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/8a75bca1/attachment-0001.htm
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Brent Yorgey-2
On Tue, Nov 18, 2008 at 02:18:46PM -0800, Michael Snoyman wrote:
>
> Very interesting solution to the problem. I tried it out and it works
> perfectly... but it's just too much of a hack for my tastes (no offense; I

None taken. =)

>
> class RepTuple a b | a -> b where
>     toList :: a -> [b]
>     tMap :: (b -> b) -> a -> a
>
> instance RepTuple (a, a) a where
>     toList (a, b) = [a, b]
>     tMap f (a, b) = (f a, f b)
>
> And so on and so forth for every kind of tuple. Of course, this runs into
> the issue of the single case, for which I used the OneTuple library
> (actually, I wrote my own right now, but I intend to just use the OneTuple
> library).
>

This is reasonable too.  It's just a tradeoff of hackishness vs. code
length/tediousness. I.e. in the solution with type-level naturals, you
don't need a separate instance like this for every number you're going
to use.  And no one really likes writing things like
tMap f (a,b,c,d,e,g,h,i,j) = ...  =)

-Brent
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

David Frey
In reply to this post by Michael Snoyman
On 11/18/2008, "Michael Snoyman" <[hidden email]> wrote:

>I am trying to write some code to read flat files from a mainframe system.
>This includes some character fields. This is a fixed width file, so each
>field will have a consistent length between records, but there are fields of
>different length within a record. For example, I might have a "name" field
>length 20 and an eye color field length 5.
>
>I am trying to use the binary library to read in this file. I've written a
>binary type, MFChar2, for reading in a 2-length character field. It is
>defined as such (you can safely ignore the ebcdicToAscii piece, it is just
>doing character conversion):
>
>data MFChar2 = MFChar2 [Word8]
>instance Binary MFChar2 where
>    put = undefined
>    get = do ebcdic <- replicateM 2 getWord8
>             return $ MFChar2 $ map ebcdicToAscii ebcdic
>
>What I would like to do is have some kind of generic "MFChar" data type
>which could take any character length, but I can't figure out how to do it.
>Any help would be appreciated.
>
>Thanks,
>Michael


Is this something that could be accomplished using template Haskell? I
don't know anything about template Haskell, but maybe someone else on
the list can comment on this suggestion.
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Michael Snoyman
In reply to this post by Brent Yorgey-2
On Tue, Nov 18, 2008 at 5:16 PM, Brent Yorgey <[hidden email]>wrote:

> > class RepTuple a b | a -> b where
> >     toList :: a -> [b]
> >     tMap :: (b -> b) -> a -> a
> >
> > instance RepTuple (a, a) a where
> >     toList (a, b) = [a, b]
> >     tMap f (a, b) = (f a, f b)
> >
> > And so on and so forth for every kind of tuple. Of course, this runs into
> > the issue of the single case, for which I used the OneTuple library
> > (actually, I wrote my own right now, but I intend to just use the
> OneTuple
> > library).
> >
>
> This is reasonable too.  It's just a tradeoff of hackishness vs. code
> length/tediousness. I.e. in the solution with type-level naturals, you
> don't need a separate instance like this for every number you're going
> to use.  And no one really likes writing things like
> tMap f (a,b,c,d,e,g,h,i,j) = ...  =)
>
> -Brent
>

I agree that it's not something people want to code; I was just thinking
that a RepTuple kind of library might be useful for other purposes (ensuring
lists of certain length essentially).

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20081118/f8e54b53/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Type polymorphism with size

Brent Yorgey-2
In reply to this post by David Frey
On Tue, Nov 18, 2008 at 07:58:24PM -0800, David Frey wrote:

> On 11/18/2008, "Michael Snoyman" <[hidden email]> wrote:
>
> >I am trying to write some code to read flat files from a mainframe system.
> >This includes some character fields. This is a fixed width file, so each
> >field will have a consistent length between records, but there are fields of
> >different length within a record. For example, I might have a "name" field
> >length 20 and an eye color field length 5.
> >
> >I am trying to use the binary library to read in this file. I've written a
> >binary type, MFChar2, for reading in a 2-length character field. It is
> >defined as such (you can safely ignore the ebcdicToAscii piece, it is just
> >doing character conversion):
> >
> >data MFChar2 = MFChar2 [Word8]
> >instance Binary MFChar2 where
> >    put = undefined
> >    get = do ebcdic <- replicateM 2 getWord8
> >             return $ MFChar2 $ map ebcdicToAscii ebcdic
> >
> >What I would like to do is have some kind of generic "MFChar" data type
> >which could take any character length, but I can't figure out how to do it.
> >Any help would be appreciated.
> >
> >Thanks,
> >Michael
>
>
> Is this something that could be accomplished using template Haskell? I
> don't know anything about template Haskell, but maybe someone else on
> the list can comment on this suggestion.

Yes, in the sense that you could use Template Haskell to generate all
the repetitive boilerplate code that you'd rather not write yourself,
e.g. one instance of Binary for each number of Words.  In this
particular case it seems sort of overkill, in the sense that it
probably wouldn't be worth the pain of learning how to use it.  (For
those who don't know, Template Haskell [1] is a GHC feature + library
that lets you do compile-time metaprogramming/code generation.)

-Brent

[1] http://www.haskell.org/haskellwiki/Template_Haskell