|
the text library and Text data type have shown the worth in real world
Haskell usage with GHC. I try to avoid String whenever possible, but I still have to deal with conversions and other issues. There is a lot of real work to be done to convert away from [Char], but I think we need to take it out of the language definition as a first step. I can only see one issue with the proposal: it can be convenient to operate on a list of characters. But I think there are plenty of solutions at our disposal. A simple conversion from Text to a list of characters might suffice. In GHC, OverloadedStrings means users would still be free to use String the same way they are now. _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On 17/03/12 11:44, Greg Weber wrote:
> the text library and Text data type have shown the worth in real world > Haskell usage with GHC. > I try to avoid String whenever possible, but I still have to deal with > conversions and other issues. > There is a lot of real work to be done to convert away from [Char], > but I think we need to take it out of the language definition as a > first step. > > I can only see one issue with the proposal: it can be convenient to > operate on a list of characters. > But I think there are plenty of solutions at our disposal. A simple > conversion from Text to a list of characters might suffice. In GHC, > OverloadedStrings means users would still be free to use String the > same way they are now. > > _______________________________________________ > Haskell-prime mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-prime over String? I'm aware of the advantages just by my own usage; hoping someone has documented it rather than in our heads. -- Tony Morris http://tmorris.net/ _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On 17 March 2012 05:30, Tony Morris <[hidden email]> wrote:
> Do you know if there is a good write-up of the benefits of Data.Text > over String? I'm aware of the advantages just by my own usage; hoping > someone has documented it rather than in our heads. Good point, it would be good to collate the experience and wisdom of this decision with some benchmark results on the HaskellWiki as The Place to link to when justifying it. _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Greg Weber
Le 17/03/2012 02:44, Greg Weber a écrit :
> the text library and Text data type have shown the worth in real world > Haskell usage with GHC. > I try to avoid String whenever possible, but I still have to deal with > conversions and other issues. > There is a lot of real work to be done to convert away from [Char], > but I think we need to take it out of the language definition as a > first step. > > I can only see one issue with the proposal: it can be convenient to > operate on a list of characters. > But I think there are plenty of solutions at our disposal. A simple > conversion from Text to a list of characters might suffice. In GHC, > OverloadedStrings means users would still be free to use String the > same way they are now. > string type should be the Text datatype, maybe the new definition should be that String is a newtype with suitable operations defined on it, and perhaps a typeclass to convert to and from this newtype. The reason of my remark is although most implementations compile to native code, an implementation compiling to, for example, JavaScript might wish to use JavaScript's string type rather than forcing its users to have a native library installed. Regards, ARJANEN Loïc _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Christopher Done
I actually was not able to successfully google for Text vs. String
benchmarks. If someone can point one out that would be very helpful. On Sat, Mar 17, 2012 at 1:52 AM, Christopher Done <[hidden email]> wrote: > On 17 March 2012 05:30, Tony Morris <[hidden email]> wrote: >> Do you know if there is a good write-up of the benefits of Data.Text >> over String? I'm aware of the advantages just by my own usage; hoping >> someone has documented it rather than in our heads. > > Good point, it would be good to collate the experience and wisdom of > this decision with some benchmark results on the HaskellWiki as The > Place to link to when justifying it. > > _______________________________________________ > Haskell-prime mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-prime _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
Hi Greg,
There are a few blog posts on Bryan's blog. Here are two of them: http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/ http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/ Unfortunately the blog seems partly broken. Images are missing and some articles are missing altogether (i.e. the article is there but the actualy body text is gone.) -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Greg Weber
On 17 March 2012 01:44, Greg Weber <[hidden email]> wrote:
> the text library and Text data type have shown the worth in real world > Haskell usage with GHC. > I try to avoid String whenever possible, but I still have to deal with > conversions and other issues. > There is a lot of real work to be done to convert away from [Char], > but I think we need to take it out of the language definition as a > first step. I'm pretty sure the majoirty of people would agree that if we were making the Haskell standard nowadays we'd make String type abstract. Unfortunately I fear making the change now will be quite disruptive, though I don't think we've collectively put much effort yet into working out just how disruptive. In principle I'd support changing to reduce the number of string types used in interfaces. From painful professional experience, I think that one of the biggest things where C++ went wrong was not having a single string type that everyone would use (I once had to write a C++ component integrating code that used 5 different string types). Like Python 3, we should have two common string types used in interfaces: string and bytes (with implementations like our current Text and ByteString). BTW, I don't think taking it out of the langauge would be a helpful step. We actually want to tell people "use *this* string type in interfaces", not leave everyone to make their own choice. I think taking it out of the language would tend to encourage everyone to make their own choice. Duncan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by ARJANEN Loïc Jean David
On 18 March 2012 19:29, ARJANEN Loïc Jean David <[hidden email]> wrote:
> Good point, but rather than specifying in the standard that the new string > type should be the Text datatype, maybe the new definition should be that > String is a newtype with suitable operations defined on it, and perhaps a > typeclass to convert to and from this newtype. The reason of my remark is > although most implementations compile to native code, an implementation > compiling to, for example, JavaScript might wish to use JavaScript's string > type rather than forcing its users to have a native library installed. I agree that the language standard should not prescribe the implementation of a Text datatype. It should instead require an abstract data type (which may just be a newtype wrapper for [Char] in some implementations) and a (minimal) set of operations on it. Regarding the type class for converting to and from that type, there is a perhaps more complicated question: The current fromString method uses String as the source type which causes unnecessary overhead. This is unfortunate since GHC's built-in mechanism actually uses unpackCString[Utf8]# which constructs the inefficient String representation from a compact memory representation. I think it would be best if the new fromString/fromText class allowed an efficient mechanism like that. unpackCString# has type Addr# -> [Char] which is obviously GHC-specific. -- Push the envelope. Watch it bend. _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
<[hidden email]> wrote: > Regarding the type class for converting to and from that type, there > is a perhaps more complicated question: The current fromString method > uses String as the source type which causes unnecessary overhead. This > is unfortunate since GHC's built-in mechanism actually uses > unpackCString[Utf8]# which constructs the inefficient String > representation from a compact memory representation. I think it would > be best if the new fromString/fromText class allowed an efficient > mechanism like that. unpackCString# has type Addr# -> [Char] which is > obviously GHC-specific. I've been thinking about this question as well. How about class IsString s where unpackCString :: Ptr Word8 -> CSize -> s It's morally equivalent of unpackCString#, but uses standard Haskell types. -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On 03/19/2012 04:53 PM, Johan Tibell wrote:
> I've been thinking about this question as well. How about > > class IsString s where > unpackCString :: Ptr Word8 -> CSize -> s What's the Ptr Word8 supposed to contain? A UTF-8 encoded string? Best regards Christian -- |------- Dr. Christian Siefkes ------- [hidden email] ------- | Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/ | Peer Production Everywhere: http://peerconomy.org/wiki/ |---------------------------------- OpenPGP Key ID: 0x346452D8 -- A choice of masters is not freedom. -- Bradley M. Kuhn and Richard M. Stallman, Freedom Or Power? _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
<[hidden email]> wrote: > On 03/19/2012 04:53 PM, Johan Tibell wrote: >> I've been thinking about this question as well. How about >> >> class IsString s where >> unpackCString :: Ptr Word8 -> CSize -> s > > What's the Ptr Word8 supposed to contain? A UTF-8 encoded string? Yes. We could make a distinction between byte and Unicode literals and have: class IsBytes a where unpackBytes :: Ptr Word8 -> Int -> a class IsText a where unpackText :: Ptr Word8 -> Int -> a In the latter the caller guarantees that the passed in pointer points to wellformed UTF-8 data. -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Johan Tibell-2
This is the best I can do with Bryan's blog posts, but none of the
graphs (which contain all the information) show up: http://web.archive.org/web/20100222031602/http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/ If someone has some benchmarks that can be ran that would be helpful. On Mon, Mar 19, 2012 at 7:51 AM, Johan Tibell <[hidden email]> wrote: > Hi Greg, > > There are a few blog posts on Bryan's blog. Here are two of them: > > http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/ > http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/ > > Unfortunately the blog seems partly broken. Images are missing and > some articles are missing altogether (i.e. the article is there but > the actualy body text is gone.) > > -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Johan Tibell-2
Don't forget that with -XOverloadedStrings we already have a IsString class. (That's not a Haskell Prime extension though.)
class IsString a where fromString :: String -> a Simon | -----Original Message----- | From: [hidden email] [mailto:haskell-prime- | [hidden email]] On Behalf Of Johan Tibell | Sent: 19 March 2012 15:54 | To: Thomas Schilling | Cc: [hidden email] | Subject: Re: String != [Char] | | On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling | <[hidden email]> wrote: | > Regarding the type class for converting to and from that type, there | > is a perhaps more complicated question: The current fromString method | > uses String as the source type which causes unnecessary overhead. This | > is unfortunate since GHC's built-in mechanism actually uses | > unpackCString[Utf8]# which constructs the inefficient String | > representation from a compact memory representation. I think it would | > be best if the new fromString/fromText class allowed an efficient | > mechanism like that. unpackCString# has type Addr# -> [Char] which is | > obviously GHC-specific. | | I've been thinking about this question as well. How about | | class IsString s where | unpackCString :: Ptr Word8 -> CSize -> s | | It's morally equivalent of unpackCString#, but uses standard Haskell types. | | -- Johan | | _______________________________________________ | Haskell-prime mailing list | [hidden email] | http://www.haskell.org/mailman/listinfo/haskell-prime _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On Mon, Mar 19, 2012 at 15:39, Simon Peyton-Jones <[hidden email]> wrote:
Don't forget that with -XOverloadedStrings we already have a IsString class. (That's not a Haskell Prime extension though.) I think that's exactly the point; currently it uses [Char] initial format and converts at runtime, which is rather unfortunate given the inefficiency of [Char]. If it has to be done at runtime, it would be nice to at least do it from a more efficient initial format.
brandon s allbery [hidden email] wandering unix systems administrator (available) (412) 475-9364 vm/sms _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Johan Tibell-2
If the input is specified to be UTF-8, wouldn't it be better to call the method unpackUTF8 or something like that?
On Mon, Mar 19, 2012 at 12:59 PM, Johan Tibell <[hidden email]> wrote:
_______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On Mon, Mar 19, 2012 at 2:55 PM, Daniel Peebles <[hidden email]> wrote:
> If the input is specified to be UTF-8, wouldn't it be better to call the > method unpackUTF8 or something like that? Sure. -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Johan Tibell-2
> On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes <[hidden email]>
> wrote: > > On 03/19/2012 04:53 PM, Johan Tibell wrote: > >> I've been thinking about this question as well. How about > >> > >> class IsString s where > >> unpackCString :: Ptr Word8 -> CSize -> s > > > > What's the Ptr Word8 supposed to contain? A UTF-8 encoded string? > > Yes. > > We could make a distinction between byte and Unicode literals and have: > > class IsBytes a where > unpackBytes :: Ptr Word8 -> Int -> a > > class IsText a where > unpackText :: Ptr Word8 -> Int -> a > > In the latter the caller guarantees that the passed in pointer points to > wellformed UTF-8 data. Is there a reason not to put all these methods in the IsString class, with appropriate default definitions? You would need a UTF-8 encoder (& decoder) of course, but it would reduce the burden on clients and improve backwards compatibility. Cheers, Simon _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
On Tue, Mar 20, 2012 at 2:25 AM, Simon Marlow <[hidden email]> wrote:
> Is there a reason not to put all these methods in the IsString class, with appropriate default definitions? You would need a UTF-8 encoder (& decoder) of course, but it would reduce the burden on clients and improve backwards compatibility. That sounds fine to me. I'm leaning towards only having unpackUTF8String (in addition to the existing method), as in the absence of proper byte literals we would have literals which change types, depending on which bytes they contain*. Ugh! * Is it even possible to create non-UTF8 literals without using escaped sequences? -- Johan _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
In reply to this post by Thomas Schilling-2
Hi,
Thomas Schilling wrote: > I agree that the language standard should not prescribe the > implementation of a Text datatype. It should instead require an > abstract data type (which may just be a newtype wrapper for [Char] in > some implementations) and a (minimal) set of operations on it. > > Regarding the type class for converting to and from that type, there > is a perhaps more complicated question: The current fromString method > uses String as the source type which causes unnecessary overhead. Is this still a problem if String would be replaced by an implementation-dependend newtype? Presumably, GHC would use a more efficient representation behind the newtype, so the following would be efficient in practice (or not?) newtype String = ... class IsString a where fromString :: String -> a The standard could even prescribe that an instance for [Char] exists: explode :: String -> [Char] explode = ... instance IsString [Char] where fromString = explode Tillmann _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
|
Le 20/03/2012 16:29, Tillmann Rendel a écrit :
> Hi, > > Thomas Schilling wrote: >> I agree that the language standard should not prescribe the >> implementation of a Text datatype. It should instead require an >> abstract data type (which may just be a newtype wrapper for [Char] in >> some implementations) and a (minimal) set of operations on it. >> >> Regarding the type class for converting to and from that type, there >> is a perhaps more complicated question: The current fromString method >> uses String as the source type which causes unnecessary overhead. > > Is this still a problem if String would be replaced by an > implementation-dependend newtype? Presumably, GHC would use a more > efficient representation behind the newtype, so the following would be > efficient in practice (or not?) > > newtype String > = ... > > class IsString a where > fromString :: String -> a > > The standard could even prescribe that an instance for [Char] exists: > > explode :: String -> [Char] > explode = ... > > instance IsString [Char] where > fromString = explode > > Tillmann A recent message on Haskell-café made me think that if the standard mandates that any instance exists, it should mandates that an instance exists for CString and CWString (C's strings and wide strings) or, more generally, that an instance exists for any foreign string type defined in the FFIs implemented. That is to say, if you implement a FFI for .Net and you expose .Net's string type, you should implement conversions between that string type and Haskell's one. Regards, ARJANEN Loïc _______________________________________________ Haskell-prime mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-prime |
| Powered by Nabble | Edit this page |
