Quantcast

How to split this string.

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to split this string.

Комар Максим
I want to write a function whose behavior is as follows:

foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
"string2\r\nstring3", "string4"]

Note the sequence "\r\n", which is ignored. How can I do this?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

emacsray
On Mon, Jan 02, 2012 at 12:44:23PM +0300, max wrote:

> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

A short yet requiring regex solution:

  > import Text.Regex.PCRE
  > match (makeRegex "(?:[^\r\n]+|\r\n)+" :: Regex) "b\nc\r\n\n\r\n\nd" :: [[String]]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Yves Parès
In reply to this post by Комар Максим
Doesn't the function "lines" handle different line-endings?
(In the Prelude and in Data.List)

If not, doing this with parsec would be easy (yet maybe slightly overkill...)


2012/1/2 max <[hidden email]>
I want to write a function whose behavior is as follows:

foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
"string2\r\nstring3", "string4"]

Note the sequence "\r\n", which is ignored. How can I do this?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Комар Максим
В Mon, 2 Jan 2012 10:45:18 +0100
Yves Parès <[hidden email]> пишет:

Prelude> lines "string1\nstring2\r\nstring3\nstring4"
["string1","string2\r","string3","string4"]

> Doesn't the function "lines" handle different line-endings?
> (In the Prelude and in Data.List)
>
> If not, doing this with parsec would be easy (yet maybe slightly
> overkill...)
>
>
> 2012/1/2 max <[hidden email]>
>
> > I want to write a function whose behavior is as follows:
> >
> > foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> > "string2\r\nstring3", "string4"]
> >
> > Note the sequence "\r\n", which is ignored. How can I do this?
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > [hidden email]
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Simon Hengel
In reply to this post by Yves Parès
> Doesn't the function "lines" handle different line-endings?
> (In the Prelude and in Data.List)
It does not ignore "\r\n".

Cheers,
Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Christian Maeder-2
In reply to this post by Комар Максим
Am 02.01.2012 10:44, schrieb max:
> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?

replace the sequence by something unique first, i.e. a single "\r" (and
revert this change later).

(Replacing a single character is easier using concatMap).

HTH Christian

-- | replace first (non-empty) sublist with second one in third
-- argument list
replace :: Eq a => [a] -> [a] -> [a] -> [a]
replace sl r = case sl of
   [] -> error "replace: empty list"
   _ -> concat . unfoldr (\ l -> case l of
     [] -> Nothing
     hd : tl -> Just $ case stripPrefix sl l of
       Nothing -> ([hd], tl)
       Just rt -> (r, rt))


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Steve Horne
In reply to this post by Комар Максим
On 02/01/2012 09:44, max wrote:
> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?
Doing it probably the hard way (and getting it wrong) looks like the
following...

--  Function to accept (normally) a single character. Special-cases
--  \r\n. Refuses to accept \n. Result is either an empty list, or
--  an (accepted, remaining) pair.
parseTok :: String -> [(String, String)]

parseTok "" = []
parseTok (c1:c2:cs) | ((c1 == '\r') && (c2 == '\n')) = [(c1:c2:[], cs)]
parseTok (c:cs)     | (c /= '\n')                    = [(c:[], cs)]
                     | True                           = []

--  Accept a sequence of those (mostly single) characters
parseItem :: String -> [(String, String)]

parseItem "" = [("","")]
parseItem cs = [(j1s ++ j2s, k2s)
                  | (j1s,k1s) <- parseTok  cs
                  , (j2s,k2s) <- parseItem k1s
                ]

--  Accept a whole list of strings
parseAll :: String -> [([String], String)]

parseAll [] = [([],"")]
parseAll cs = [(j1s:j2s,k2s)
                 | (j1s,k1s) <- parseItem cs
                 , (j2s,k2s) <- parseAll  k1s
               ]

--  Get the first valid result, which should have consumed the
--  whole string but this isn't checked. No check for existence either.
parse :: String -> [String]
parse cs = fst (head (parseAll cs))

I got it wrong in that this never consumes the \n between items, so
it'll all go horribly wrong. There's a good chance there's a typo or two
as well. The basic idea should be clear, though - maybe I should fix it
but I've got some other things to do at the moment. Think of the \n as a
separator, or as a prefix to every "item" but the first. Alternatively,
treat it as a prefix to *every* item, and artificially add an initial
one to the string in the top-level parse function. The use tail etc to
remove that from the first item.

See http://channel9.msdn.com/Tags/haskell - there's a series of 13
videos by Dr. Erik Meijer. The eighth in the series covers this basic
technique - it calls them monadic and uses the do notation and that
confused me slightly at first, it's the *list* type which is monadic in
this case and (as you can see) I prefer to use list comprehensions
rather than do notation.

There may be a simpler way, though - there's still a fair bit of Haskell
and its ecosystem I need to figure out. There's a tool called alex, for
instance, but I've not used it.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Anupam Jain
In reply to this post by Комар Максим
On Mon, Jan 2, 2012 at 3:14 PM, max <[hidden email]> wrote:
> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?

Here's a simple way (may not be the most efficient) -

import Data.List (isSuffixOf)

split = reverse . foldl f [] . lines
  where
    f [] w = [w]
    f (x:xs) w = if "\r" `isSuffixOf` x then ((x++"\n"++w):xs) else (w:x:xs)

Testing -

ghci> split "ab\r\ncd\nefgh\nhijk"
["ab\r\ncd","efgh","hijk"]


-- Anupam

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

emacsray
In reply to this post by Комар Максим
On Mon, Jan 02, 2012 at 12:44:23PM +0300, max wrote:

> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

unixLines :: String -> [String]
unixLines xs = reverse . map reverse $ go xs "" []
  where
    go [] l ls = l:ls
    go ('\r':'\n':xs) l ls = go xs ('\n':'\r':l) ls
    go ('\n':xs) l ls = go xs "" (l:ls)
    go (x:xs) l ls = go xs (x:l) ls

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Yves Parès
In reply to this post by Комар Максим
Okay, so it doesn't handle different line-endings.

I have a more general solution (statefulSplit)
http://hpaste.org/55980

I cannot test it as I don't have an interpreter at hand, but if someone has, I'd be glad to have comments.
(It might be more readable by using the State monad)

2012/1/2 max <[hidden email]>
В Mon, 2 Jan 2012 10:45:18 +0100
Yves Parès <[hidden email]> пишет:

Prelude> lines "string1\nstring2\r\nstring3\nstring4"
["string1","string2\r","string3","string4"]

> Doesn't the function "lines" handle different line-endings?
> (In the Prelude and in Data.List)
>
> If not, doing this with parsec would be easy (yet maybe slightly
> overkill...)
>
>
> 2012/1/2 max <[hidden email]>
>
> > I want to write a function whose behavior is as follows:
> >
> > foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> > "string2\r\nstring3", "string4"]
> >
> > Note the sequence "\r\n", which is ignored. How can I do this?
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > [hidden email]
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Jon Fairbairn
In reply to this post by Комар Максим
max <[hidden email]> writes:

> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?

cabal install split

then do something like

   import Data.List (groupBy)
   import Data.List.Split (splitOn)

   rn '\r' '\n' = True
   rn _ _ = False

   required_function = fmap concat . splitOn ["\n"] . groupBy rn

(though that might be an abuse of groupBy)

--
Jón Fairbairn                                 [hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Комар Максим
В Mon, 02 Jan 2012 11:12:49 +0000
Jon Fairbairn <[hidden email]> пишет:

> max <[hidden email]> writes:
>
> > I want to write a function whose behavior is as follows:
> >
> > foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> > "string2\r\nstring3", "string4"]
> >
> > Note the sequence "\r\n", which is ignored. How can I do this?
>
> cabal install split
>
> then do something like
>
>    import Data.List (groupBy)
>    import Data.List.Split (splitOn)
>
>    rn '\r' '\n' = True
>    rn _ _ = False
>
>    required_function = fmap concat . splitOn ["\n"] . groupBy rn
>
> (though that might be an abuse of groupBy)
>

This is the simplest solution of the proposed, in my opinion. Thank you
very much.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Felipe Lessa
On Mon, Jan 2, 2012 at 10:12 AM, max <[hidden email]> wrote:
> This is the simplest solution of the proposed, in my opinion. Thank you
> very much.

Better yet, don't use String and use Text.  Then you just need
T.splitOn "\r\n" [1].

Cheers,

[1] http://hackage.haskell.org/packages/archive/text/0.11.1.12/doc/html/Data-Text.html#v:splitOn

--
Felipe.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Anupam Jain
On Mon, Jan 2, 2012 at 5:52 PM, Felipe Almeida Lessa
<[hidden email]> wrote:
> On Mon, Jan 2, 2012 at 10:12 AM, max <[hidden email]> wrote:
>> This is the simplest solution of the proposed, in my opinion. Thank you
>> very much.
>
> Better yet, don't use String and use Text.  Then you just need
> T.splitOn "\r\n" [1].

That is actually the opposite of what the OP wants, however it's
interesting that Text has a function like that and not the String
functions in the standard
library.

-- Anupam

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Markus Läll-2
String is really for small strings. Text is more efficent and also has
more functionality, including most, if not all, of the functions
defined for String.

On Mon, Jan 2, 2012 at 3:12 PM, Anupam Jain <[hidden email]> wrote:

> On Mon, Jan 2, 2012 at 5:52 PM, Felipe Almeida Lessa
> <[hidden email]> wrote:
>> On Mon, Jan 2, 2012 at 10:12 AM, max <[hidden email]> wrote:
>>> This is the simplest solution of the proposed, in my opinion. Thank you
>>> very much.
>>
>> Better yet, don't use String and use Text.  Then you just need
>> T.splitOn "\r\n" [1].
>
> That is actually the opposite of what the OP wants, however it's
> interesting that Text has a function like that and not the String
> functions in the standard
> library.
>
> -- Anupam
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Markus Läll

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

anonymous
In reply to this post by Комар Максим
If you're interested in learning parsec, RWH covered this topic in depth in Chapter 16, Choices and Errors: http://book.realworldhaskell.org/read/using-parsec.html.

On Mon, Jan 2, 2012 at 3:44 AM, max <[hidden email]> wrote:
I want to write a function whose behavior is as follows:

foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
"string2\r\nstring3", "string4"]

Note the sequence "\r\n", which is ignored. How can I do this?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Steve Horne
In reply to this post by Jon Fairbairn
On 02/01/2012 11:12, Jon Fairbairn wrote:

> max<[hidden email]>  writes:
>
>> I want to write a function whose behavior is as follows:
>>
>> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
>> "string2\r\nstring3", "string4"]
>>
>> Note the sequence "\r\n", which is ignored. How can I do this?
> cabal install split
>
> then do something like
>
>     import Data.List (groupBy)
>     import Data.List.Split (splitOn)
>
>     rn '\r' '\n' = True
>     rn _ _ = False
>
>     required_function = fmap concat . splitOn ["\n"] . groupBy rn
>
> (though that might be an abuse of groupBy)
>
Sadly, it turns out that not only is this an abuse of groupBy, but it
has (I think) a subtle bug as a result.

I was inspired by this to try some other groupBy stuff, and it didn't
work. After scratching my head a bit, I tried the following...

Prelude> import Data.List
Prelude Data.List> groupBy (<) [1,2,3,2,1,2,3,2,1]
[[1,2,3,2],[1,2,3,2],[1]]

That wasn't exactly the result I was expecting :-(

Explanation (best guess) - the function passed to groupBy, according to
the docs, is meant to test whether two values are 'equal'. I'm guessing
the assumption is that the function will effectively treat values as
belonging to equivalence classes. That implies some rules such as...

   (a == a)
   reflexivity : (a == b) => (b == a)
   transitivity : (a == b) && (b == c) => (a == c)

I'm not quite certain I got those names right, and I can't remember the
name of the first rule at all, sorry.

The third rule is probably to blame here. By the rules, groupBy doesn't
need to compare adjacent items. When it starts a new group, it seems to
always use the first item in that new group until it finds a mismatch.
In my test, that means it's always comparing with 1 - the second 2 is
included in each group because although (3 < 2) is False, groupBy isn't
testing that - it's testing (1 < 2).

In the context of this \r\n test function, this behaviour will I guess
result in \r\n\n being combined into one group. The second \n will
therefore not be seen as a valid splitting point.


Personally, I think this is a tad disappointing. Given that groupBy
cannot check or enforce that it's test respects equivalence classes, it
should ideally give results that make as much sense as possible either
way. That said, even if the test was always given adjacent elements,
there's still room for a different order of processing the list
(left-to-right or right-to-left) to give different results - and in any
case, maybe it's more efficient the way it is.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Steve Horne
On 04/01/2012 16:47, Steve Horne wrote:
>
>   (a == a)
>   reflexivity : (a == b) => (b == a)
>   transitivity : (a == b) && (b == c) => (a == c)
>
Oops - that's...

reflexivity :  (a == a)
symmetry : (a == b) => (b == a)
transitivity : (a == b) && (b == c) => (a == c)

An equivalence relation is a relation that meets all these conditions.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

Christian Maeder-2
In reply to this post by Steve Horne
Am 04.01.2012 17:47, schrieb Steve Horne:
> On 02/01/2012 11:12, Jon Fairbairn wrote:
>> max<[hidden email]> writes:
>>
>>> I want to write a function whose behavior is as follows:
>>>
>>> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
>>> "string2\r\nstring3", "string4"]
>>>
>>> Note the sequence "\r\n", which is ignored. How can I do this?

Why do you have these (unhealthy) different kinds of line breaks (Unix
and Windows style) in your string in the first place?

I hope, not by something calling "unlines" (or intercalate "\n") earlier.

Cheers Christian

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to split this string.

AUGER Cédric
In reply to this post by Steve Horne
Le Wed, 04 Jan 2012 17:49:15 +0000,
Steve Horne <[hidden email]> a écrit :

> On 04/01/2012 16:47, Steve Horne wrote:
> >
> >   (a == a)
> >   reflexivity : (a == b) => (b == a)
> >   transitivity : (a == b) && (b == c) => (a == c)
> >
> Oops - that's...
>
> reflexivity :  (a == a)
> symmetry : (a == b) => (b == a)
> transitivity : (a == b) && (b == c) => (a == c)
>
> An equivalence relation is a relation that meets all these conditions.
>
>

I prefer to use "transymmetry" (although I guess it is not a regular
word):

reflexivity: a ≃ a
transymmetry: ∀ a b. b≃a ⇒ ∀ c. c≃a ⇒ b≃c

so I only have 2 rules.
transymmetry is trivially derived from transitivity and symmetry.
symmetry is trivially derived from reflexivity and transymmetry.
transitivity is trivially derived from symmetry and transymmetry
 (and thus from transymmetry and reflexivity)

> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12
Loading...