I/O and utf8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

I/O and utf8

Andreas Kägi
hello
i want to read a file encoded in utf8 and at a later time output portions of it
on the console. Is there an easy way to do this in haskell? using the standard
i/o functions i can read the file but the output gives me \1071 ... instead of
the unicode characters.



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: I/O and utf8

John Meacham
On Sun, Jan 08, 2006 at 11:26:05AM +0000, Andreas Kägi wrote:
> hello
> i want to read a file encoded in utf8 and at a later time output portions of it
> on the console. Is there an easy way to do this in haskell? using the standard
> i/o functions i can read the file but the output gives me \1071 ... instead of
> the unicode characters.

Jhc does all of its IO in utf8. CharIO is a drop in replacement for the
standard prelude routines which converts everything to and from UTF8

http://repetae.net/john/repos/jhc/CharIO.hs
http://repetae.net/john/repos/jhc/UTF8.hs

        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: I/O and utf8

Bulat Ziganshin
Hello John,

Tuesday, January 10, 2006, 2:08:44 AM, you wrote:

>> i want to read a file encoded in utf8 and at a later time output portions of it
>> on the console. Is there an easy way to do this in haskell? using the standard
>> i/o functions i can read the file but the output gives me \1071 ... instead of
>> the unicode characters.

JM> Jhc does all of its IO in utf8. CharIO is a drop in replacement for the
JM> standard prelude routines which converts everything to and from UTF8

JM> http://repetae.net/john/repos/jhc/CharIO.hs
JM> http://repetae.net/john/repos/jhc/UTF8.hs

btw, i plan to add this functionality to my Binary/Streams library,
basing on your code, John. so it will work something like:

unicode_stdout <- openWithEncoding unicode stdout
vPutStrLn unicode_stdout "it's a test"

i have the question about this issue - i also want to provide
autodetection mechanism, which relies on first bytes of text files to
set proper encoding. what is the standard rules to encode utf8/utf16
encoding used for text in file in these first bytes?



--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re[2]: I/O and utf8

Bayley, Alistair
In reply to this post by Andreas Kägi
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Bulat Ziganshin
>
> i have the question about this issue - i also want to provide
> autodetection mechanism, which relies on first bytes of text files to
> set proper encoding. what is the standard rules to encode utf8/utf16
> encoding used for text in file in these first bytes?


Are you asking about the byte-order-mark in UTF encodings?
  http://www.unicode.org/faq/utf_bom.html#BOM

Note that UTF8 files typically lack the BOM, as UTF8 is meant to be
backwards-compatible with US7ASCII, I think. Windows Notepad is one of
the few programs that will insert it if a text file is saved as UTF8.

Alistair.
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: I/O and utf8

Einar Karttunen
In reply to this post by Bulat Ziganshin
On 10.01 10:25, Bulat Ziganshin wrote:
> i have the question about this issue - i also want to provide
> autodetection mechanism, which relies on first bytes of text files to
> set proper encoding. what is the standard rules to encode utf8/utf16
> encoding used for text in file in these first bytes?

The BOM is used to mark the encoding
(http://en.wikipedia.org/wiki/Byte_Order_Mark), but most
UTF-8 streams lack it. I have not seen it used in UTF-8 files either.

Do you plan on supporting things like HTTP where the character set
is only known in the middle of the parsing?

- Einar Karttunen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: I/O and utf8

Bulat Ziganshin
Hello Einar,

Wednesday, January 11, 2006, 6:14:44 PM, you wrote:

EK> Do you plan on supporting things like HTTP where the character set
EK> is only known in the middle of the parsing?

yes, it is supported, see Examples/Encoding.hs in the
http://freearc.narod.ru/Binary.tar.gz :

 h <- openWithEncoding latin1 =<< openBinaryFile "test" ReadMode
 print =<< vGetLine h
 vSetEncoding h utf8
 print =<< vGetLine h
 vSetEncoding h latin1
 print =<< vGetLine h
 vClose h

it's not optimized currently. if you will need more speed - yell me


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe