String != [Char]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
112 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

String != [Char]

Greg Weber
the text library and Text data type have shown the worth in real world
Haskell usage with GHC.
I try to avoid String whenever possible, but I still have to deal with
conversions and other issues.
There is a lot of real work to be done to convert away from [Char],
but I think we need to take it out of the language definition as a
first step.

I can only see one issue with the proposal: it can be convenient to
operate on a list of characters.
But I think there are plenty of solutions at our disposal. A simple
conversion from Text to a list of characters might suffice. In GHC,
OverloadedStrings means users would still be free to use String the
same way they are now.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Tony Morris-4
On 17/03/12 11:44, Greg Weber wrote:

> the text library and Text data type have shown the worth in real world
> Haskell usage with GHC.
> I try to avoid String whenever possible, but I still have to deal with
> conversions and other issues.
> There is a lot of real work to be done to convert away from [Char],
> but I think we need to take it out of the language definition as a
> first step.
>
> I can only see one issue with the proposal: it can be convenient to
> operate on a list of characters.
> But I think there are plenty of solutions at our disposal. A simple
> conversion from Text to a list of characters might suffice. In GHC,
> OverloadedStrings means users would still be free to use String the
> same way they are now.
>
> _______________________________________________
> Haskell-prime mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-prime
Do you know if there is a good write-up of the benefits of Data.Text
over String? I'm aware of the advantages just by my own usage; hoping
someone has documented it rather than in our heads.

--
Tony Morris
http://tmorris.net/



_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Christopher Done
On 17 March 2012 05:30, Tony Morris <[hidden email]> wrote:
> Do you know if there is a good write-up of the benefits of Data.Text
> over String? I'm aware of the advantages just by my own usage; hoping
> someone has documented it rather than in our heads.

Good point, it would be good to collate the experience and wisdom of
this decision with some benchmark results on the HaskellWiki as The
Place to link to when justifying it.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

ARJANEN Loïc Jean David
In reply to this post by Greg Weber
Le 17/03/2012 02:44, Greg Weber a écrit :

> the text library and Text data type have shown the worth in real world
> Haskell usage with GHC.
> I try to avoid String whenever possible, but I still have to deal with
> conversions and other issues.
> There is a lot of real work to be done to convert away from [Char],
> but I think we need to take it out of the language definition as a
> first step.
>
> I can only see one issue with the proposal: it can be convenient to
> operate on a list of characters.
> But I think there are plenty of solutions at our disposal. A simple
> conversion from Text to a list of characters might suffice. In GHC,
> OverloadedStrings means users would still be free to use String the
> same way they are now.
>
Good point, but rather than specifying in the standard that the new
string type should be the Text datatype, maybe the new definition should
be that String is a newtype with suitable operations defined on it, and
perhaps a typeclass to convert to and from this newtype. The reason of
my remark is although most implementations compile to native code, an
implementation compiling to, for example, JavaScript might wish to use
JavaScript's string type rather than forcing its users to have a native
library installed.

Regards,
ARJANEN Loïc

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Greg Weber
In reply to this post by Christopher Done
I actually was not able to successfully google for Text vs. String
benchmarks. If someone can point one out that would be very helpful.

On Sat, Mar 17, 2012 at 1:52 AM, Christopher Done
<[hidden email]> wrote:

> On 17 March 2012 05:30, Tony Morris <[hidden email]> wrote:
>> Do you know if there is a good write-up of the benefits of Data.Text
>> over String? I'm aware of the advantages just by my own usage; hoping
>> someone has documented it rather than in our heads.
>
> Good point, it would be good to collate the experience and wisdom of
> this decision with some benchmark results on the HaskellWiki as The
> Place to link to when justifying it.
>
> _______________________________________________
> Haskell-prime mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-prime

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Johan Tibell-2
Hi Greg,

There are a few blog posts on Bryan's blog. Here are two of them:

    http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
    http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

Unfortunately the blog seems partly broken. Images are missing and
some articles are missing altogether (i.e. the article is there but
the actualy body text is gone.)

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Duncan Coutts-4
In reply to this post by Greg Weber
On 17 March 2012 01:44, Greg Weber <[hidden email]> wrote:
> the text library and Text data type have shown the worth in real world
> Haskell usage with GHC.
> I try to avoid String whenever possible, but I still have to deal with
> conversions and other issues.
> There is a lot of real work to be done to convert away from [Char],
> but I think we need to take it out of the language definition as a
> first step.

I'm pretty sure the majoirty of people would agree that if we were
making the Haskell standard nowadays we'd make String type abstract.

Unfortunately I fear making the change now will be quite disruptive,
though I don't think we've collectively put much effort yet into
working out just how disruptive.

In principle I'd support changing to reduce the number of string types
used in interfaces. From painful professional experience, I think that
one of the biggest things where C++ went wrong was not having a single
string type that everyone would use (I once had to write a C++
component integrating code that used 5 different string types). Like
Python 3, we should have two common string types used in interfaces:
string and bytes (with implementations like our current Text and
ByteString).

BTW, I don't think taking it out of the langauge would be a helpful
step. We actually want to tell people "use *this* string type in
interfaces", not leave everyone to make their own choice. I think
taking it out of the language would tend to encourage everyone to make
their own choice.

Duncan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Thomas Schilling-2
In reply to this post by ARJANEN Loïc Jean David
On 18 March 2012 19:29, ARJANEN Loïc Jean David <[hidden email]> wrote:

> Good point, but rather than specifying in the standard that the new string
> type should be the Text datatype, maybe the new definition should be that
> String is a newtype with suitable operations defined on it, and perhaps a
> typeclass to convert to and from this newtype. The reason of my remark is
> although most implementations compile to native code, an implementation
> compiling to, for example, JavaScript might wish to use JavaScript's string
> type rather than forcing its users to have a native library installed.

I agree that the language standard should not prescribe the
implementation of a Text datatype.  It should instead require an
abstract data type (which may just be a newtype wrapper for [Char] in
some implementations) and a (minimal) set of operations on it.

Regarding the type class for converting to and from that type, there
is a perhaps more complicated question: The current fromString method
uses String as the source type which causes unnecessary overhead. This
is unfortunate since GHC's built-in mechanism actually uses
unpackCString[Utf8]# which constructs the inefficient String
representation from a compact memory representation.  I think it would
be best if the new fromString/fromText class allowed an efficient
mechanism like that.  unpackCString# has type Addr# -> [Char] which is
obviously GHC-specific.


--
Push the envelope. Watch it bend.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Johan Tibell-2
On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
<[hidden email]> wrote:
> Regarding the type class for converting to and from that type, there
> is a perhaps more complicated question: The current fromString method
> uses String as the source type which causes unnecessary overhead. This
> is unfortunate since GHC's built-in mechanism actually uses
> unpackCString[Utf8]# which constructs the inefficient String
> representation from a compact memory representation.  I think it would
> be best if the new fromString/fromText class allowed an efficient
> mechanism like that.  unpackCString# has type Addr# -> [Char] which is
> obviously GHC-specific.

I've been thinking about this question as well. How about

class IsString s where
    unpackCString :: Ptr Word8 -> CSize -> s

It's morally equivalent of unpackCString#, but uses standard Haskell types.

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Christian Siefkes
On 03/19/2012 04:53 PM, Johan Tibell wrote:
> I've been thinking about this question as well. How about
>
> class IsString s where
>     unpackCString :: Ptr Word8 -> CSize -> s

What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Best regards
        Christian

--
|------- Dr. Christian Siefkes ------- [hidden email] -------
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|    Peer Production Everywhere:       http://peerconomy.org/wiki/
|---------------------------------- OpenPGP Key ID: 0x346452D8 --
A choice of masters is not freedom.
        -- Bradley M. Kuhn and Richard M. Stallman, Freedom Or Power?


_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime

signature.asc (270 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Johan Tibell-2
On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
<[hidden email]> wrote:
> On 03/19/2012 04:53 PM, Johan Tibell wrote:
>> I've been thinking about this question as well. How about
>>
>> class IsString s where
>>     unpackCString :: Ptr Word8 -> CSize -> s
>
> What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Yes.

We could make a distinction between byte and Unicode literals and have:

class IsBytes a where
    unpackBytes :: Ptr Word8 -> Int -> a

class IsText a where
    unpackText :: Ptr Word8 -> Int -> a

In the latter the caller guarantees that the passed in pointer points
to wellformed UTF-8 data.

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Greg Weber
In reply to this post by Johan Tibell-2
This is the best I can do with Bryan's blog posts, but none of the
graphs (which contain all the information) show up:
http://web.archive.org/web/20100222031602/http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

If someone has some benchmarks that can be ran that would be helpful.

On Mon, Mar 19, 2012 at 7:51 AM, Johan Tibell <[hidden email]> wrote:

> Hi Greg,
>
> There are a few blog posts on Bryan's blog. Here are two of them:
>
>    http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
>    http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/
>
> Unfortunately the blog seems partly broken. Images are missing and
> some articles are missing altogether (i.e. the article is there but
> the actualy body text is gone.)
>
> -- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

RE: String != [Char]

Simon Peyton Jones
In reply to this post by Johan Tibell-2
Don't forget that with -XOverloadedStrings we already have a IsString class.  (That's not a Haskell Prime extension though.)

class IsString a where
    fromString :: String -> a

Simon

|  -----Original Message-----
|  From: [hidden email] [mailto:haskell-prime-
|  [hidden email]] On Behalf Of Johan Tibell
|  Sent: 19 March 2012 15:54
|  To: Thomas Schilling
|  Cc: [hidden email]
|  Subject: Re: String != [Char]
|  
|  On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
|  <[hidden email]> wrote:
|  > Regarding the type class for converting to and from that type, there
|  > is a perhaps more complicated question: The current fromString method
|  > uses String as the source type which causes unnecessary overhead. This
|  > is unfortunate since GHC's built-in mechanism actually uses
|  > unpackCString[Utf8]# which constructs the inefficient String
|  > representation from a compact memory representation.  I think it would
|  > be best if the new fromString/fromText class allowed an efficient
|  > mechanism like that.  unpackCString# has type Addr# -> [Char] which is
|  > obviously GHC-specific.
|  
|  I've been thinking about this question as well. How about
|  
|  class IsString s where
|      unpackCString :: Ptr Word8 -> CSize -> s
|  
|  It's morally equivalent of unpackCString#, but uses standard Haskell types.
|  
|  -- Johan
|  
|  _______________________________________________
|  Haskell-prime mailing list
|  [hidden email]
|  http://www.haskell.org/mailman/listinfo/haskell-prime



_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Brandon Allbery
On Mon, Mar 19, 2012 at 15:39, Simon Peyton-Jones <[hidden email]> wrote:
Don't forget that with -XOverloadedStrings we already have a IsString class.  (That's not a Haskell Prime extension though.)

I think that's exactly the point; currently it uses [Char] initial format and converts at runtime, which is rather unfortunate given the inefficiency of [Char].  If it has to be done at runtime, it would be nice to at least do it from a more efficient initial format.

--
brandon s allbery                                      [hidden email]
wandering unix systems administrator (available)     (412) 475-9364 vm/sms


_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Daniel Peebles
In reply to this post by Johan Tibell-2
If the input is specified to be UTF-8, wouldn't it be better to call the method unpackUTF8 or something like that?

On Mon, Mar 19, 2012 at 12:59 PM, Johan Tibell <[hidden email]> wrote:
On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
<[hidden email]> wrote:
> On 03/19/2012 04:53 PM, Johan Tibell wrote:
>> I've been thinking about this question as well. How about
>>
>> class IsString s where
>>     unpackCString :: Ptr Word8 -> CSize -> s
>
> What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Yes.

We could make a distinction between byte and Unicode literals and have:

class IsBytes a where
   unpackBytes :: Ptr Word8 -> Int -> a

class IsText a where
   unpackText :: Ptr Word8 -> Int -> a

In the latter the caller guarantees that the passed in pointer points
to wellformed UTF-8 data.

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime


_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Johan Tibell-2
On Mon, Mar 19, 2012 at 2:55 PM, Daniel Peebles <[hidden email]> wrote:
> If the input is specified to be UTF-8, wouldn't it be better to call the
> method unpackUTF8 or something like that?

Sure.

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

RE: String != [Char]

Simon Marlow
In reply to this post by Johan Tibell-2
> On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes <[hidden email]>
> wrote:
> > On 03/19/2012 04:53 PM, Johan Tibell wrote:
> >> I've been thinking about this question as well. How about
> >>
> >> class IsString s where
> >>     unpackCString :: Ptr Word8 -> CSize -> s
> >
> > What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?
>
> Yes.
>
> We could make a distinction between byte and Unicode literals and have:
>
> class IsBytes a where
>     unpackBytes :: Ptr Word8 -> Int -> a
>
> class IsText a where
>     unpackText :: Ptr Word8 -> Int -> a
>
> In the latter the caller guarantees that the passed in pointer points to
> wellformed UTF-8 data.

Is there a reason not to put all these methods in the IsString class, with appropriate default definitions?  You would need a UTF-8 encoder (& decoder) of course, but it would reduce the burden on clients and improve backwards compatibility.

Cheers,
        Simon



_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Johan Tibell-2
On Tue, Mar 20, 2012 at 2:25 AM, Simon Marlow <[hidden email]> wrote:
> Is there a reason not to put all these methods in the IsString class, with appropriate default definitions?  You would need a UTF-8 encoder (& decoder) of course, but it would reduce the burden on clients and improve backwards compatibility.

That sounds fine to me. I'm leaning towards only having
unpackUTF8String (in addition to the existing method), as in the
absence of proper byte literals we would have literals which change
types, depending on which bytes they contain*. Ugh!

* Is it even possible to create non-UTF8 literals without using
escaped sequences?

-- Johan

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

Tillmann Rendel-5
In reply to this post by Thomas Schilling-2
Hi,

Thomas Schilling wrote:
> I agree that the language standard should not prescribe the
> implementation of a Text datatype.  It should instead require an
> abstract data type (which may just be a newtype wrapper for [Char] in
> some implementations) and a (minimal) set of operations on it.
>
> Regarding the type class for converting to and from that type, there
> is a perhaps more complicated question: The current fromString method
> uses String as the source type which causes unnecessary overhead.

Is this still a problem if String would be replaced by an
implementation-dependend newtype? Presumably, GHC would use a more
efficient representation behind the newtype, so the following would be
efficient in practice (or not?)

   newtype String
     = ...

   class IsString a where
     fromString :: String -> a

The standard could even prescribe that an instance for [Char] exists:

   explode :: String -> [Char]
   explode = ...

   instance IsString [Char] where
     fromString = explode

Tillmann

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: String != [Char]

ARJANEN Loïc Jean David
Le 20/03/2012 16:29, Tillmann Rendel a écrit :

> Hi,
>
> Thomas Schilling wrote:
>> I agree that the language standard should not prescribe the
>> implementation of a Text datatype.  It should instead require an
>> abstract data type (which may just be a newtype wrapper for [Char] in
>> some implementations) and a (minimal) set of operations on it.
>>
>> Regarding the type class for converting to and from that type, there
>> is a perhaps more complicated question: The current fromString method
>> uses String as the source type which causes unnecessary overhead.
>
> Is this still a problem if String would be replaced by an
> implementation-dependend newtype? Presumably, GHC would use a more
> efficient representation behind the newtype, so the following would be
> efficient in practice (or not?)
>
>   newtype String
>     = ...
>
>   class IsString a where
>     fromString :: String -> a
>
> The standard could even prescribe that an instance for [Char] exists:
>
>   explode :: String -> [Char]
>   explode = ...
>
>   instance IsString [Char] where
>     fromString = explode
>
> Tillmann

A recent message on Haskell-café made me think that if the standard
mandates that any instance exists, it should mandates that an instance
exists for CString and CWString (C's strings and wide strings) or, more
generally, that an instance exists for any foreign string type defined
in the FFIs implemented. That is to say, if you implement a FFI for .Net
and you expose .Net's string type, you should implement conversions
between that string type and Haskell's one.

Regards,
ARJANEN Loïc


_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
1234 ... 6