Parsing a file with data divided into sections

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing a file with data divided into sections

Patrick LeBoutillier
Hi all,

I wanted to parse a file that looks like this:

MONDAY
        JOHN
        JIM
        LINDA
TUESDAY
        BILL
        BOB
WEDNESDAY
THURSDAY
        SAM
        TODD
        LARRY
        LUKE
FRIDAY
        TED

in order to count the number of people for each day. After a (very)
long time and a lot of trial and
error, I managed to do it with this program:

import Char (isSpace)

main = interact (unlines . countSections . lines)
   where countSections = map (show) . snd . foldr compileSections (0, [])
         compileSections line (n, acc) =
             if isSection line
             then (0, (line, n) : acc)
             else (n + 1, acc)
         isSection line = not . isSpace . head $ line

which outputs:

("MONDAY",3)
("TUESDAY",2)
("WEDNESDAY",0)
("THURSDAY",4)
("FRIDAY",1)

I had quite a hard time figuring out how to keep count of the number
of records in each sections.
Is there a more obvious way to handle these types of problems? Are
there some builtins that could
of made it easier?

In Perl I would probably have used a hash and a variable to keep count
of the current day, incrementing
the hash value for each person until I got to the next day, but it's
not obvious to me how to transpose this
technique to functional programming.


Thanks a lot,

Patrick
--
=====================
Patrick LeBoutillier
Rosem?re, Qu?bec, Canada
Reply | Threaded
Open this post in threaded view
|

Parsing a file with data divided into sections

Magnus Therning
On Wed, Feb 4, 2009 at 1:58 AM, Patrick LeBoutillier
<[hidden email]> wrote:

> Hi all,
>
> I wanted to parse a file that looks like this:
>
> MONDAY
>        JOHN
>        JIM
>        LINDA
> TUESDAY
>        BILL
>        BOB
> WEDNESDAY
> THURSDAY
>        SAM
>        TODD
>        LARRY
>        LUKE
> FRIDAY
>        TED
>
> in order to count the number of people for each day. After a (very)
> long time and a lot of trial and
> error, I managed to do it with this program:
>
> import Char (isSpace)
>
> main = interact (unlines . countSections . lines)
>   where countSections = map (show) . snd . foldr compileSections (0, [])
>         compileSections line (n, acc) =
>             if isSection line
>             then (0, (line, n) : acc)
>             else (n + 1, acc)
>         isSection line = not . isSpace . head $ line
>
> which outputs:
>
> ("MONDAY",3)
> ("TUESDAY",2)
> ("WEDNESDAY",0)
> ("THURSDAY",4)
> ("FRIDAY",1)
>
> I had quite a hard time figuring out how to keep count of the number
> of records in each sections.
> Is there a more obvious way to handle these types of problems? Are
> there some builtins that could
> of made it easier?

I think that you've pretty much used accumulators in the way they most
often are used.  In many cases you don't _have_ to keep count though.
Here's one way to get the same result, but without keeping count:

countDays [] = []
countDays ls = let
        day = head ls
        count = length $ takeWhile (isSpace . head) $ tail ls
    in (day, count) : countDays (drop (1 + count) ls)

main = interact (unlines . map show . countDays . lines)

> In Perl I would probably have used a hash and a variable to keep count
> of the current day, incrementing
> the hash value for each person until I got to the next day, but it's
> not obvious to me how to transpose this
> technique to functional programming.

Often transposing from imperative to functional isn't what you want to
do.  One of the arguments for learning both imperative and functional
languages is that they approach problems differently, resulting in
different solutions.  I suspect you will find Haskell, and indeed any
functional language, difficult to use if you try to "think
imperative".  It takes time to learn new ways to think about problems,
but in the end you'll never look at things the same way again :-)

/M

--
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus?therning?org          Jabber: magnus?therning?org
http://therning.org/magnus         identi.ca|twitter: magthe
Reply | Threaded
Open this post in threaded view
|

Re: Parsing a file with data divided into sections

Heinrich Apfelmus
Magnus Therning wrote:

> Patrick LeBoutillier wrote:
>>
>> I wanted to parse a file that looks like this:
>>
>> MONDAY
>>        JOHN
>>        JIM
>>        LINDA
>> TUESDAY
>>        BILL
>>        BOB
>> WEDNESDAY
>> THURSDAY
>>        SAM
>>        TODD
>>        LARRY
>>        LUKE
>> FRIDAY
>>        TED
>>
>> in order to count the number of people for each day. After a (very)
>> long time and a lot of trial and error,
>> I managed to do it with this program:

Nice program, especially your use of function composition is good style.

> I think that you've pretty much used accumulators in the way they most
> often are used.  In many cases you don't _have_ to keep count though.
> Here's one way to get the same result, but without keeping count:
>
> countDays [] = []
> countDays ls = let
>         day = head ls
>         count = length $ takeWhile (isSpace . head) $ tail ls
>     in (day, count) : countDays (drop (1 + count) ls)
>
> main = interact (unlines . map show . countDays . lines)

Here's a version using  span  from the Prelude:

  main = interact $ unlines . map show . countDays . lines

  countDays []       = []
  countDays (day:xs) = (day, length people) : countDays xs'
     where (people, xs') = span (isSpace . head) xs


Note that this file format is very simple and it's ok to use  lines  and
 isSpace  to parse it. But the tool of choice are parser combinators
like  Text.Parsec  or  Text.ParserCombinators.ReadP .


--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: Parsing a file with data divided into sections

Magnus Therning
On Wed, Feb 4, 2009 at 11:10 AM, Heinrich Apfelmus
<[hidden email]> wrote:
[..]
> Here's a version using  span  from the Prelude:
>
>  main = interact $ unlines . map show . countDays . lines
>
>  countDays []       = []
>  countDays (day:xs) = (day, length people) : countDays xs'
>     where (people, xs') = span (isSpace . head) xs

Ah, yes, `span` is the function I was looking for!  I would have found
it if I had bothered to go to Hoogle :-(

/M

--
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus?therning?org          Jabber: magnus?therning?org
http://therning.org/magnus         identi.ca|twitter: magthe
Reply | Threaded
Open this post in threaded view
|

Re: Parsing a file with data divided into sections

Patrick LeBoutillier
In reply to this post by Heinrich Apfelmus
> Nice program, especially your use of function composition is good style.

Thanks, I must admit I put alot of time into refactoring it. There's so many
different ways of doing stuff (and then simplifying them) in Haskell!

>> I think that you've pretty much used accumulators in the way they most
>> often are used.  In many cases you don't _have_ to keep count though.
>> Here's one way to get the same result, but without keeping count:
>>
>> countDays [] = []
>> countDays ls = let
>>         day = head ls
>>         count = length $ takeWhile (isSpace . head) $ tail ls
>>     in (day, count) : countDays (drop (1 + count) ls)
>>
>> main = interact (unlines . map show . countDays . lines)
>
> Here's a version using  span  from the Prelude:
>
>  main = interact $ unlines . map show . countDays . lines
>
>  countDays []       = []
>  countDays (day:xs) = (day, length people) : countDays xs'
>     where (people, xs') = span (isSpace . head) xs
>

Both of these examples are great and exactly what I was looking for: a
different approach to the problem. I guess the step that was missing in my
"thought process" is that recursion doesn't have to imply processing the list
elements 1 by 1 (1 recursive call for each element). Of course it makes
perfect sense once you see it....

Thanks a lot to everyone, as usual the people on this list are always very
kind and helpful.


Patrick

>
> Note that this file format is very simple and it's ok to use  lines  and
>  isSpace  to parse it. But the tool of choice are parser combinators
> like  Text.Parsec  or  Text.ParserCombinators.ReadP .
>
>
> --
> http://apfelmus.nfshost.com
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners
>



--
=====================
Patrick LeBoutillier
Rosem?re, Qu?bec, Canada
Reply | Threaded
Open this post in threaded view
|

Re: Parsing a file with data divided into sections

Brent Yorgey-2
In reply to this post by Heinrich Apfelmus
On Wed, Feb 04, 2009 at 12:10:02PM +0100, Heinrich Apfelmus wrote:

> Magnus Therning wrote:
>
> Here's a version using  span  from the Prelude:
>
>   main = interact $ unlines . map show . countDays . lines
>
>   countDays []       = []
>   countDays (day:xs) = (day, length people) : countDays xs'
>      where (people, xs') = span (isSpace . head) xs
>
>
> Note that this file format is very simple and it's ok to use  lines  and
>  isSpace  to parse it. But the tool of choice are parser combinators
> like  Text.Parsec  or  Text.ParserCombinators.ReadP .

Let me also point out that using the 'split' package from Hackage [1], we
can implement this nicely without any icky manual recursion, and
without resorting to a full-blown parsing library:

  import Data.List.Split
  import Data.Char

  main = interact (unlines . map show . countDays . lines)

  countDays = map length . tail . splitWhen isDay
    where isDay = not . isSpace . head

'splitWhen' splits the list of lines into chunks using things that
satisfy the predicate as delimiters; then we just 'map length' over
those chunks to count the people for each day.  The 'tail' is
necessary because the split actually produces a blank list of people
*before* the first day, but otherwise this is entirely
straightforward.

-Brent

[1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split
Reply | Threaded
Open this post in threaded view
|

mind your head, was: Parsing a file with data divided into sections

Christian Maeder-2
In reply to this post by Magnus Therning
Magnus Therning wrote:
>>         isSection line = not . isSpace . head $ line

I'ld be more careful with "head"! Imagine your binary fails with:

Prelude.head: empty list

> countDays [] = []
> countDays ls = let
>         day = head ls
>         count = length $ takeWhile (isSpace . head) $ tail ls
>     in (day, count) : countDays (drop (1 + count) ls)

here too, and better use a pattern like: ls@(hd, tl)

Cheers Christian




Reply | Threaded
Open this post in threaded view
|

Re: mind your head, was: Parsing a file with data divided into sections

Christian Maeder-2
I wrote:
> Magnus Therning wrote:
>>>         isSection line = not . isSpace . head $ line
>
> I'ld be more careful with "head"! Imagine your binary fails with:
>
> Prelude.head: empty list

see also
http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_like_fromJust_and_head

>> countDays [] = []
>> countDays ls = let
>>         day = head ls
>>         count = length $ takeWhile (isSpace . head) $ tail ls
>>     in (day, count) : countDays (drop (1 + count) ls)
>
> here too, and better use a pattern like: ls@(hd, tl)

oops, the pattern should be: ls@(hd : tl)

Cheers again
Christian

Reply | Threaded
Open this post in threaded view
|

Re: mind your head, was: Parsing a file with data divided into sections

Magnus Therning
On Fri, Feb 6, 2009 at 8:47 AM, Christian Maeder
<[hidden email]> wrote:

> I wrote:
>> Magnus Therning wrote:
>>>>         isSection line = not . isSpace . head $ line
>>
>> I'ld be more careful with "head"! Imagine your binary fails with:
>>
>> Prelude.head: empty list
>
> see also
> http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_like_fromJust_and_head
>
>>> countDays [] = []
>>> countDays ls = let
>>>         day = head ls
>>>         count = length $ takeWhile (isSpace . head) $ tail ls
>>>     in (day, count) : countDays (drop (1 + count) ls)
>>
>> here too, and better use a pattern like: ls@(hd, tl)
>
> oops, the pattern should be: ls@(hd : tl)

Another course of action would be to add exception handling; if head
throws an exception then the input file isn't following the accepted
syntax.

/M

--
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus?therning?org          Jabber: magnus?therning?org
http://therning.org/magnus         identi.ca|twitter: magthe
Reply | Threaded
Open this post in threaded view
|

Re: mind your head, was: Parsing a file with data divided into sections

Christian Maeder-2
Magnus Therning wrote:
[...]
>>>>>         isSection line = not . isSpace . head $ line

In this particular case I would use:
  isSection = null . takeWhile isSpace

>>> Prelude.head: empty list
>> see also
>> http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_like_fromJust_and_head
>
> Another course of action would be to add exception handling; if head
> throws an exception then the input file isn't following the accepted
> syntax.

I'ld rather check all lines before (or filter non empty ones) than
trying to catch Prelude.head: empty list!

C.

Reply | Threaded
Open this post in threaded view
|

Re: mind your head, was: Parsing a file with data divided into sections

Magnus Therning
On Fri, Feb 6, 2009 at 10:30 AM, Christian Maeder
<[hidden email]> wrote:
[..]
>> Another course of action would be to add exception handling; if head
>> throws an exception then the input file isn't following the accepted
>> syntax.
>
> I'ld rather check all lines before (or filter non empty ones) than
> trying to catch Prelude.head: empty list!

Yes, that's how I modified my own "solution" once I tracked down what
sort of input it would fail on:

    main = interact (unlines . map show . countDays . dropWhile null . lines)

/M

--
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus?therning?org          Jabber: magnus?therning?org
http://therning.org/magnus         identi.ca|twitter: magthe