Splitting a string into chunks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting a string into chunks

Adam Turoff
Hi,

I'm trying to split a string into a list of substrings, where substrings
are delimited by blank lines.

This feels like it *should* be a primitive operation, but I can't seem
to find one that works.  It's neither a fold nor a partition, since each
chunk is separated by a 2-character sequence.  It's also not a grouping
operation, since ghc's Data.List.groupBy examines the first element in a
sequence with each candidate member of the same sequence, as
demonstrated by:

    Prelude> :module + Data.List
    Prelude Data.List> let t = "asdfjkl;"
    Prelude Data.List> groupBy (\a _ -> a == 's') t
    ["a","sdfjkl;"]

As a result, I've wound up with this:

    -- Convert a file into blocks separated by blank lines (two
    -- consecutive \n characters.) NB: Requires UNIX linefeeds

    blocks :: String -> [String]
    blocks s = f "" s
      where
        f "" [] = []
        f s [] = [s]
        f s ('\n':'\n':rest) = (s:f "" rest)
        f s (a:rest) = f (s ++ [a]) rest

Which somehow feels ugly.  This feels like it should be a fold, a group
or something, where the test is something like:

    (\a b -> (a /= '\n') && (b /= '\n'))

Any thoughts?

Thanks,

-- Adam
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Sebastian Sylvan
On 1/13/06, Adam Turoff <[hidden email]> wrote:

> Hi,
>
> I'm trying to split a string into a list of substrings, where substrings
> are delimited by blank lines.
>
> This feels like it *should* be a primitive operation, but I can't seem
> to find one that works.  It's neither a fold nor a partition, since each
> chunk is separated by a 2-character sequence.  It's also not a grouping
> operation, since ghc's Data.List.groupBy examines the first element in a
> sequence with each candidate member of the same sequence, as
> demonstrated by:
>
>     Prelude> :module + Data.List
>     Prelude Data.List> let t = "asdfjkl;"
>     Prelude Data.List> groupBy (\a _ -> a == 's') t
>     ["a","sdfjkl;"]
>
> As a result, I've wound up with this:
>
>     -- Convert a file into blocks separated by blank lines (two
>     -- consecutive \n characters.) NB: Requires UNIX linefeeds
>
>     blocks :: String -> [String]
>     blocks s = f "" s
>       where
>         f "" [] = []
>         f s [] = [s]
>         f s ('\n':'\n':rest) = (s:f "" rest)
>         f s (a:rest) = f (s ++ [a]) rest
>
> Which somehow feels ugly.  This feels like it should be a fold, a group
> or something, where the test is something like:
>
>     (\a b -> (a /= '\n') && (b /= '\n'))

Off the top of my head:

blocks = map concat . groupBy (const null) . lines

The lines function splits it into lines, the groupBy will group the
list into lists of lists and split when the sedond of two adjacent
elements is null (which is what an empty line passed to lines will
give you) and then a concat on each of the elements of this list will
"undo" the redundant lines-splitting that lines performed...

/S
--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Sebastian Sylvan
On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:

> On 1/13/06, Adam Turoff <[hidden email]> wrote:
> > Hi,
> >
> > I'm trying to split a string into a list of substrings, where substrings
> > are delimited by blank lines.
> >
> > This feels like it *should* be a primitive operation, but I can't seem
> > to find one that works.  It's neither a fold nor a partition, since each
> > chunk is separated by a 2-character sequence.  It's also not a grouping
> > operation, since ghc's Data.List.groupBy examines the first element in a
> > sequence with each candidate member of the same sequence, as
> > demonstrated by:
> >
> >     Prelude> :module + Data.List
> >     Prelude Data.List> let t = "asdfjkl;"
> >     Prelude Data.List> groupBy (\a _ -> a == 's') t
> >     ["a","sdfjkl;"]
> >
> > As a result, I've wound up with this:
> >
> >     -- Convert a file into blocks separated by blank lines (two
> >     -- consecutive \n characters.) NB: Requires UNIX linefeeds
> >
> >     blocks :: String -> [String]
> >     blocks s = f "" s
> >       where
> >         f "" [] = []
> >         f s [] = [s]
> >         f s ('\n':'\n':rest) = (s:f "" rest)
> >         f s (a:rest) = f (s ++ [a]) rest
> >
> > Which somehow feels ugly.  This feels like it should be a fold, a group
> > or something, where the test is something like:
> >
> >     (\a b -> (a /= '\n') && (b /= '\n'))
>
> Off the top of my head:
>
> blocks = map concat . groupBy (const null) . lines
>
> The lines function splits it into lines, the groupBy will group the
> list into lists of lists and split when the sedond of two adjacent
> elements is null (which is what an empty line passed to lines will
> give you) and then a concat on each of the elements of this list will
> "undo" the redundant lines-splitting that lines performed...
>

Sorry, I got the meaning of groupBy mixed up, it should be

blocks = map concat . groupBy (const (not . null)) . lines

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Jared Updike
That works except it loses single newline characters.

let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
Prelude> blocks s
["12345678","abcdefghijklmnopq",",,.,.,."]

  Jared.

On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:

> On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:
> > On 1/13/06, Adam Turoff <[hidden email]> wrote:
> > > Hi,
> > >
> > > I'm trying to split a string into a list of substrings, where substrings
> > > are delimited by blank lines.
> > >
> > > This feels like it *should* be a primitive operation, but I can't seem
> > > to find one that works.  It's neither a fold nor a partition, since each
> > > chunk is separated by a 2-character sequence.  It's also not a grouping
> > > operation, since ghc's Data.List.groupBy examines the first element in a
> > > sequence with each candidate member of the same sequence, as
> > > demonstrated by:
> > >
> > >     Prelude> :module + Data.List
> > >     Prelude Data.List> let t = "asdfjkl;"
> > >     Prelude Data.List> groupBy (\a _ -> a == 's') t
> > >     ["a","sdfjkl;"]
> > >
> > > As a result, I've wound up with this:
> > >
> > >     -- Convert a file into blocks separated by blank lines (two
> > >     -- consecutive \n characters.) NB: Requires UNIX linefeeds
> > >
> > >     blocks :: String -> [String]
> > >     blocks s = f "" s
> > >       where
> > >         f "" [] = []
> > >         f s [] = [s]
> > >         f s ('\n':'\n':rest) = (s:f "" rest)
> > >         f s (a:rest) = f (s ++ [a]) rest
> > >
> > > Which somehow feels ugly.  This feels like it should be a fold, a group
> > > or something, where the test is something like:
> > >
> > >     (\a b -> (a /= '\n') && (b /= '\n'))
> >
> > Off the top of my head:
> >
> > blocks = map concat . groupBy (const null) . lines
> >
> > The lines function splits it into lines, the groupBy will group the
> > list into lists of lists and split when the sedond of two adjacent
> > elements is null (which is what an empty line passed to lines will
> > give you) and then a concat on each of the elements of this list will
> > "undo" the redundant lines-splitting that lines performed...
> >
>
> Sorry, I got the meaning of groupBy mixed up, it should be
>
> blocks = map concat . groupBy (const (not . null)) . lines
>
> /S
>
> --
> Sebastian Sylvan
> +46(0)736-818655
> UIN: 44640862
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


--
[hidden email]
http://www.updike.org/~jared/
reverse ")-:"
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Jon Fairbairn
On 2006-01-13 at 13:32PST Jared Updike wrote:
> That works except it loses single newline characters.
>
> let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
> Prelude> blocks s
> ["12345678","abcdefghijklmnopq",",,.,.,."]

Also the argument to groupBy ought to be some sort of
equivalence relation.

blocks = map unlines
         . filter (all $ not . null)
         . groupBy (\a b -> not (null b|| null a))
         . lines

... but that suffers from the somewhat questionable
properties of lines and unlines.

--
Jón Fairbairn                              Jon.Fairbairn at cl.cam.ac.uk


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Bugzilla from robdockins@fastmail.fm

On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote:

> On 2006-01-13 at 13:32PST Jared Updike wrote:
>> That works except it loses single newline characters.
>>
>> let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
>> Prelude> blocks s
>> ["12345678","abcdefghijklmnopq",",,.,.,."]
>
> Also the argument to groupBy ought to be some sort of
> equivalence relation.

Humm, still not reflexive.  You need xor.

> blocks = map unlines
>          . filter (all $ not . null)
>          . groupBy (\a b -> not (null b|| null a))
>          . lines
>
> ... but that suffers from the somewhat questionable
> properties of lines and unlines.
>
> -- Jón Fairbairn                              Jon.Fairbairn at  
> cl.cam.ac.uk
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Adam Turoff
In reply to this post by Sebastian Sylvan
On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:
> blocks = map concat . groupBy (const (not . null)) . lines

Thanks.  That's a little more involved than I was looking for, but that
certainly looks better than pattern matching on ('\n':'\n':rest).  ;-)

For the record, lines removes the trailing newline, so a string like:

    a
    b

    c
    d

becomes ["ab", "cd"], which can interfere with processing if the whitespace
is significant.  Changing this to

   blocks = map unlines . groupBy (const (not . null)) . lines

re-adds all of the newlines, thus re-adding the significant whitespace,
while still chunking everything into blocks:   ["a\nb\n","\nc\nd\n"]

Thanks again,

-- Adam
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a string into chunks

Jon Fairbairn
In reply to this post by Bugzilla from robdockins@fastmail.fm
On 2006-01-13 at 16:50EST Robert Dockins wrote:

> On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote:
>
> > On 2006-01-13 at 13:32PST Jared Updike wrote:
> >> That works except it loses single newline characters.
> >>
> >> let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
> >> Prelude> blocks s
> >> ["12345678","abcdefghijklmnopq",",,.,.,."]
> >
> > Also the argument to groupBy ought to be some sort of
> > equivalence relation.
>
> Humm, still not reflexive.  You need xor.

ugh, yes. How about

>
> > blocks = map unlines
> >          . filter (all $ not . null)
> >          . groupBy

                       (\a b -> null b == null a)
> >          . lines

?
--
Jón Fairbairn                              Jon.Fairbairn at cl.cam.ac.uk


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe