[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Colin Paul Adams
I forgot to CC the list:

>>>>> "Roel" == Roel van Dijk <[hidden email]> writes:

    Roel> I propose to make UTF-8 the only allowed encoding for Haskell
    Roel> source files. Implementations must discard an initial Byte
    Roel> Order Mark (BOM) if present [3].


    Roel> * Pros - Ensures that Haskell source can be reliably exchanged
    Roel> on the byte level.  - Disallows implicit ISO-8859-* encodings
    Roel> in source code, ensuring portability.  - Little or no
    Roel> implementation burden for compiler writers.

Having thought this over a bit more, I don't think it's a good idea.

Allowed? Allowed for what?

What does it achieve? Nothing, as far as I can see. Authors will still
be able to write their Haskell code in any encoding they like. And any
compiler can have a front-end script with an option to specify the
encoding used by source files, which simply uses iconv on the fly to
translate.

I think the real place to mandate UTF-8 would be for Hackage. That's
where it matters (an alternative design would be to add an encoding
field in the .cabal file, but I don't think this has much merit).

--
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Bas van Dijk-2
On 6 April 2011 17:34, Colin Paul Adams <[hidden email]> wrote:

> I forgot to CC the list:
>
>>>>>> "Roel" == Roel van Dijk <[hidden email]> writes:
>
>    Roel> I propose to make UTF-8 the only allowed encoding for Haskell
>    Roel> source files. Implementations must discard an initial Byte
>    Roel> Order Mark (BOM) if present [3].
>
>
>    Roel> * Pros - Ensures that Haskell source can be reliably exchanged
>    Roel> on the byte level.  - Disallows implicit ISO-8859-* encodings
>    Roel> in source code, ensuring portability.  - Little or no
>    Roel> implementation burden for compiler writers.
>
> Having thought this over a bit more, I don't think it's a good idea.
>
> Allowed? Allowed for what?

Allowed to be called a Haskell file.

If the report doesn't specify what a Haskell file is then we can't
reliably exchange Haskell source files by only looking at the files
themselves.

> What does it achieve? Nothing, as far as I can see. Authors will still
> be able to write their Haskell code in any encoding they like. And any
> compiler can have a front-end script with an option to specify the
> encoding used by source files, which simply uses iconv on the fly to
> translate.

Suppose I give you MyHaskellFile.hs. But before telling you how it's
encoded I go gliding (a hobby of mine). Unfortunately I crash my
glider and die :-(. Now what encoding option do you give to your
front-end script?

> I think the real place to mandate UTF-8 would be for Hackage. That's
> where it matters (an alternative design would be to add an encoding
> field in the .cabal file, but I don't think this has much merit).

That would only allow users of Hackage and Cabal to reliably exchange
their Haskell files. If we specify it in the report every user can
benefit.

Regards,

Bas

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Colin Paul Adams
>>>>> "Bas" == Bas van Dijk <[hidden email]> writes:

    Bas> On 6 April 2011 17:34, Colin Paul Adams <[hidden email]> wrote:

    >> Allowed? Allowed for what?

    Bas> Allowed to be called a Haskell file.

Well, what the report says on that is irrelevant. If I see a file
containing Haskell code, I shall call it a Haskell file, irrespective. I
suspect I will be in the majority.

    Bas> If the report doesn't specify what a Haskell file is then we
    Bas> can't reliably exchange Haskell source files by only looking at
    Bas> the files themselves.

Sure we can.

    >> What does it achieve? Nothing, as far as I can see. Authors will
    >> still be able to write their Haskell code in any encoding they
    >> like. And any compiler can have a front-end script with an option
    >> to specify the encoding used by source files, which simply uses
    >> iconv on the fly to translate.

    Bas> Suppose I give you MyHaskellFile.hs. But before telling you how
    Bas> it's encoded I go gliding (a hobby of mine). Unfortunately I
    Bas> crash my glider and die :-(. Now what encoding option do you
    Bas> give to your front-end script?

Whatever the encoding happens to be. That won't be hard to find out. And
presumably Haskell programmers don't dies so very frequently that it
will become a time-consuming affair.

    >> I think the real place to mandate UTF-8 would be for
    >> Hackage. That's where it matters (an alternative design would be
    >> to add an encoding field in the .cabal file, but I don't think
    >> this has much merit).

    Bas> That would only allow users of Hackage and Cabal to reliably
    Bas> exchange their Haskell files. If we specify it in the report
    Bas> every user can benefit.

There is no benefit that I see. Anyone is free to write Haskell code in
whatever encoding they fancy. Irrespective of what the report says. It's
not going to have the force of law.
--
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Roel van Dijk-3
On 6 April 2011 20:42, Colin Paul Adams <[hidden email]> wrote:
>>>>>> "Bas" == Bas van Dijk <[hidden email]> writes:
>    Bas> On 6 April 2011 17:34, Colin Paul Adams <[hidden email]> wrote:
>    >> Allowed? Allowed for what?
>    Bas> Allowed to be called a Haskell file.
> Well, what the report says on that is irrelevant. If I see a file
> containing Haskell code, I shall call it a Haskell file, irrespective. I
> suspect I will be in the majority.

It seems you have a problem with the word "allowed". What do you think
of the interoperability guidelines as proposed by Duncan? They are
less stringent while having the same intention as my original
proposal.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Christian Maeder-2
In reply to this post by Bas van Dijk-2
Am 06.04.2011 20:02, schrieb Bas van Dijk:
> On 6 April 2011 17:34, Colin Paul Adams<[hidden email]>  wrote:
[...]
>> I think the real place to mandate UTF-8 would be for Hackage. That's
>> where it matters (an alternative design would be to add an encoding
>> field in the .cabal file, but I don't think this has much merit).
>
> That would only allow users of Hackage and Cabal to reliably exchange
> their Haskell files. If we specify it in the report every user can
> benefit.

I agree that Haskell files should be UTF-8, but I also agree that it is
only relevant for Hackage (and Cabal) and already enforced by ghc-6.12.
or higher.

The motivation for this proposal can only be that future cabal packages
will use more and more non-ASCII characters as is possible via
http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and
LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\"
as lambda symbol - probably because lambda is a letter and no symbol!)

However, I think, these extra characters only make sense for corner
cases and should not be recommended for general purposes.

For nicer looking sources I would recommend special viewers or
post-processors (like haddock or hscolour) that translate certain ASCII
sequences to unicode points.

So my view is: Stick to ASCII and only if you must (not just for casual
reasons) use UTF-8.

Cheers Christian

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Christian Maeder-2
Am 07.04.2011 11:29, schrieb Christian Maeder:
> So my view is: Stick to ASCII and only if you must (not just for casual
> reasons) use UTF-8.

This means all comments in haskell sources (for hackage) should be in
English, exclusively! Supply separate documentation in your mother
tongue if required.

And I rather write out "Euro" or "Lambda" than trying to find the
corresponding unicode character (and even in .tex sources ASCII
sequences exist for those).

>
> Cheers Christian

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Colin Paul Adams
In reply to this post by Roel van Dijk-3
>>>>> "Roel" == Roel van Dijk <[hidden email]> writes:

    Roel> On 6 April 2011 20:42, Colin Paul Adams <[hidden email]> wrote:

    Roel> It seems you have a problem with the word "allowed". What do
    Roel> you think of the interoperability guidelines as proposed by
    Roel> Duncan? They are less stringent while having the same
    Roel> intention as my original proposal.

I think they are fine.
--
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Roel van Dijk-3
In reply to this post by Christian Maeder-2
On 7 April 2011 11:29, Christian Maeder <[hidden email]> wrote:
> I agree that Haskell files should be UTF-8, but I also agree that it is only
> relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or
> higher.

It is relevant for all tools and systems which process Haskell sources.

> The motivation for this proposal can only be that future cabal packages will
> use more and more non-ASCII characters as is possible via
> http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and
> LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\" as
> lambda symbol - probably because lambda is a letter and no symbol!)

The motivation for this proposal is interoperability of all tools and
systems which process Haskell source files. Perhaps I could have made
that more clear.

> However, I think, these extra characters only make sense for corner cases
> and should not be recommended for general purposes.

Please take a look at the following file:
http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

I have many more like that. I do not consider Chinese a corner case.
Nor the vast amount of languages which can not be represented using
ASCII.

> So my view is: Stick to ASCII and only if you must (not just for casual
> reasons) use UTF-8.

When to use certain characters is not part of the proposal.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Christian Maeder-2
Am 07.04.2011 13:09, schrieb Roel van Dijk:
> Please take a look at the following file:
> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

Great, that file made my firefox open infinitely many tabs (so that I
had to close it).

C.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Colin Paul Adams
>>>>> "Christian" == Christian Maeder <[hidden email]> writes:

    Christian> Am 07.04.2011 13:09, schrieb Roel van Dijk:
    >> Please take a look at the following file:
    >> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

    Christian> Great, that file made my firefox open infinitely many
    Christian> tabs (so that I had to close it).

On mine, it just launched Emacs to open the file (where it looked
great).

Note that I certainly agree with Roel on Chinese not being a corner
case. (And my wife would certainly have something to say if I didn't,
she being Chinese herself!)
--
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Christian Maeder-2
In reply to this post by Christian Maeder-2
Am 07.04.2011 13:24, schrieb Christian Maeder:
> Am 07.04.2011 13:09, schrieb Roel van Dijk:
>> Please take a look at the following file:
>> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs
>
> Great, that file made my firefox open infinitely many tabs (so that I
> had to close it).

Well, my firefox had "use firefox" for "Haskell source code" (and failed
for any .hs file)

C.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

David Virebayre
In reply to this post by Christian Maeder-2


2011/4/7 Christian Maeder <[hidden email]>
Am 07.04.2011 11:29, schrieb Christian Maeder:

So my view is: Stick to ASCII and only if you must (not just for casual
reasons) use UTF-8.

This means all comments in haskell sources (for hackage) should be in English, exclusively! Supply separate documentation in your mother tongue if required.

This thread being about the encoding of haskell source files, not hackage's, I don't see the point in talking about restricting hackage's langage to English.
- it is not the topic
- it's already a de-facto standard anyways.

On the other hand, not restricting the usage of any langage in haskell source files is IMHO a must, and it's not well supported as it is; for example haddock does't support accentuated letters in comments.

This proposal gives a clear signal that utf8 characters have to be taken into account, and hopefully tools like haddock will evolve to support them thanks to this proposal.





_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Christian Maeder-2
In reply to this post by Roel van Dijk-3
Am 07.04.2011 13:09, schrieb Roel van Dijk:
> Please take a look at the following file:
> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

The code would not suffer much if it were pure ASCII. I would prefer
(ascii) haddock links to explain the various code points.

C.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime
Reply | Threaded
Open this post in threaded view
|

Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Roel van Dijk-3
On 7 April 2011 15:03, Christian Maeder <[hidden email]> wrote:
> The code would not suffer much if it were pure ASCII. I would prefer (ascii)
> haddock links to explain the various code points.

The code in question contains Chinese characters like '三', which in a
US-ASCII encoded Haskell file must be written as '\x4e09'. I do not
consider these escape sequences an acceptable substitute.

But this discussion is tangential to the proposal. I am interested in
having a common set of guidelines to ensure interoperability of
Haskell sources. An important part of that is having a common method
of decoding files containing Haskell code. The easiest way to achieve
that is using only 1 encoding. UTF-8 is the best candidate for that
role.

_______________________________________________
Haskell-prime mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-prime