Hi,
I'm trying to split a string into a list of substrings, where substrings are delimited by blank lines. This feels like it *should* be a primitive operation, but I can't seem to find one that works. It's neither a fold nor a partition, since each chunk is separated by a 2-character sequence. It's also not a grouping operation, since ghc's Data.List.groupBy examines the first element in a sequence with each candidate member of the same sequence, as demonstrated by: Prelude> :module + Data.List Prelude Data.List> let t = "asdfjkl;" Prelude Data.List> groupBy (\a _ -> a == 's') t ["a","sdfjkl;"] As a result, I've wound up with this: -- Convert a file into blocks separated by blank lines (two -- consecutive \n characters.) NB: Requires UNIX linefeeds blocks :: String -> [String] blocks s = f "" s where f "" [] = [] f s [] = [s] f s ('\n':'\n':rest) = (s:f "" rest) f s (a:rest) = f (s ++ [a]) rest Which somehow feels ugly. This feels like it should be a fold, a group or something, where the test is something like: (\a b -> (a /= '\n') && (b /= '\n')) Any thoughts? Thanks, -- Adam _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 1/13/06, Adam Turoff <[hidden email]> wrote:
> Hi, > > I'm trying to split a string into a list of substrings, where substrings > are delimited by blank lines. > > This feels like it *should* be a primitive operation, but I can't seem > to find one that works. It's neither a fold nor a partition, since each > chunk is separated by a 2-character sequence. It's also not a grouping > operation, since ghc's Data.List.groupBy examines the first element in a > sequence with each candidate member of the same sequence, as > demonstrated by: > > Prelude> :module + Data.List > Prelude Data.List> let t = "asdfjkl;" > Prelude Data.List> groupBy (\a _ -> a == 's') t > ["a","sdfjkl;"] > > As a result, I've wound up with this: > > -- Convert a file into blocks separated by blank lines (two > -- consecutive \n characters.) NB: Requires UNIX linefeeds > > blocks :: String -> [String] > blocks s = f "" s > where > f "" [] = [] > f s [] = [s] > f s ('\n':'\n':rest) = (s:f "" rest) > f s (a:rest) = f (s ++ [a]) rest > > Which somehow feels ugly. This feels like it should be a fold, a group > or something, where the test is something like: > > (\a b -> (a /= '\n') && (b /= '\n')) Off the top of my head: blocks = map concat . groupBy (const null) . lines The lines function splits it into lines, the groupBy will group the list into lists of lists and split when the sedond of two adjacent elements is null (which is what an empty line passed to lines will give you) and then a concat on each of the elements of this list will "undo" the redundant lines-splitting that lines performed... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:
> On 1/13/06, Adam Turoff <[hidden email]> wrote: > > Hi, > > > > I'm trying to split a string into a list of substrings, where substrings > > are delimited by blank lines. > > > > This feels like it *should* be a primitive operation, but I can't seem > > to find one that works. It's neither a fold nor a partition, since each > > chunk is separated by a 2-character sequence. It's also not a grouping > > operation, since ghc's Data.List.groupBy examines the first element in a > > sequence with each candidate member of the same sequence, as > > demonstrated by: > > > > Prelude> :module + Data.List > > Prelude Data.List> let t = "asdfjkl;" > > Prelude Data.List> groupBy (\a _ -> a == 's') t > > ["a","sdfjkl;"] > > > > As a result, I've wound up with this: > > > > -- Convert a file into blocks separated by blank lines (two > > -- consecutive \n characters.) NB: Requires UNIX linefeeds > > > > blocks :: String -> [String] > > blocks s = f "" s > > where > > f "" [] = [] > > f s [] = [s] > > f s ('\n':'\n':rest) = (s:f "" rest) > > f s (a:rest) = f (s ++ [a]) rest > > > > Which somehow feels ugly. This feels like it should be a fold, a group > > or something, where the test is something like: > > > > (\a b -> (a /= '\n') && (b /= '\n')) > > Off the top of my head: > > blocks = map concat . groupBy (const null) . lines > > The lines function splits it into lines, the groupBy will group the > list into lists of lists and split when the sedond of two adjacent > elements is null (which is what an empty line passed to lines will > give you) and then a concat on each of the elements of this list will > "undo" the redundant lines-splitting that lines performed... > Sorry, I got the meaning of groupBy mixed up, it should be blocks = map concat . groupBy (const (not . null)) . lines /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
That works except it loses single newline characters.
let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." Prelude> blocks s ["12345678","abcdefghijklmnopq",",,.,.,."] Jared. On 1/13/06, Sebastian Sylvan <[hidden email]> wrote: > On 1/13/06, Sebastian Sylvan <[hidden email]> wrote: > > On 1/13/06, Adam Turoff <[hidden email]> wrote: > > > Hi, > > > > > > I'm trying to split a string into a list of substrings, where substrings > > > are delimited by blank lines. > > > > > > This feels like it *should* be a primitive operation, but I can't seem > > > to find one that works. It's neither a fold nor a partition, since each > > > chunk is separated by a 2-character sequence. It's also not a grouping > > > operation, since ghc's Data.List.groupBy examines the first element in a > > > sequence with each candidate member of the same sequence, as > > > demonstrated by: > > > > > > Prelude> :module + Data.List > > > Prelude Data.List> let t = "asdfjkl;" > > > Prelude Data.List> groupBy (\a _ -> a == 's') t > > > ["a","sdfjkl;"] > > > > > > As a result, I've wound up with this: > > > > > > -- Convert a file into blocks separated by blank lines (two > > > -- consecutive \n characters.) NB: Requires UNIX linefeeds > > > > > > blocks :: String -> [String] > > > blocks s = f "" s > > > where > > > f "" [] = [] > > > f s [] = [s] > > > f s ('\n':'\n':rest) = (s:f "" rest) > > > f s (a:rest) = f (s ++ [a]) rest > > > > > > Which somehow feels ugly. This feels like it should be a fold, a group > > > or something, where the test is something like: > > > > > > (\a b -> (a /= '\n') && (b /= '\n')) > > > > Off the top of my head: > > > > blocks = map concat . groupBy (const null) . lines > > > > The lines function splits it into lines, the groupBy will group the > > list into lists of lists and split when the sedond of two adjacent > > elements is null (which is what an empty line passed to lines will > > give you) and then a concat on each of the elements of this list will > > "undo" the redundant lines-splitting that lines performed... > > > > Sorry, I got the meaning of groupBy mixed up, it should be > > blocks = map concat . groupBy (const (not . null)) . lines > > /S > > -- > Sebastian Sylvan > +46(0)736-818655 > UIN: 44640862 > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > -- [hidden email] http://www.updike.org/~jared/ reverse ")-:" _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 2006-01-13 at 13:32PST Jared Updike wrote:
> That works except it loses single newline characters. > > let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." > Prelude> blocks s > ["12345678","abcdefghijklmnopq",",,.,.,."] Also the argument to groupBy ought to be some sort of equivalence relation. blocks = map unlines . filter (all $ not . null) . groupBy (\a b -> not (null b|| null a)) . lines ... but that suffers from the somewhat questionable properties of lines and unlines. -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote: > On 2006-01-13 at 13:32PST Jared Updike wrote: >> That works except it loses single newline characters. >> >> let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." >> Prelude> blocks s >> ["12345678","abcdefghijklmnopq",",,.,.,."] > > Also the argument to groupBy ought to be some sort of > equivalence relation. Humm, still not reflexive. You need xor. > blocks = map unlines > . filter (all $ not . null) > . groupBy (\a b -> not (null b|| null a)) > . lines > > ... but that suffers from the somewhat questionable > properties of lines and unlines. > > -- Jón Fairbairn Jon.Fairbairn at > cl.cam.ac.uk > > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Sebastian Sylvan
On 1/13/06, Sebastian Sylvan <[hidden email]> wrote:
> blocks = map concat . groupBy (const (not . null)) . lines Thanks. That's a little more involved than I was looking for, but that certainly looks better than pattern matching on ('\n':'\n':rest). ;-) For the record, lines removes the trailing newline, so a string like: a b c d becomes ["ab", "cd"], which can interfere with processing if the whitespace is significant. Changing this to blocks = map unlines . groupBy (const (not . null)) . lines re-adds all of the newlines, thus re-adding the significant whitespace, while still chunking everything into blocks: ["a\nb\n","\nc\nd\n"] Thanks again, -- Adam _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Bugzilla from robdockins@fastmail.fm
On 2006-01-13 at 16:50EST Robert Dockins wrote:
> On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote: > > > On 2006-01-13 at 13:32PST Jared Updike wrote: > >> That works except it loses single newline characters. > >> > >> let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." > >> Prelude> blocks s > >> ["12345678","abcdefghijklmnopq",",,.,.,."] > > > > Also the argument to groupBy ought to be some sort of > > equivalence relation. > > Humm, still not reflexive. You need xor. ugh, yes. How about > > > blocks = map unlines > > . filter (all $ not . null) > > . groupBy (\a b -> null b == null a) > > . lines ? -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |