HXT: encoding problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
Hi guys

I have got the following haskell program:

------------------------------------------------------
import Text.XML.HXT.Core

main = do
    xml <- readFile "test_data-small.xml"
    let doc = readString config xml
    res <- runX . xshow $
        doc
        >>>
        getChildren >>> isElem >>> hasName "contacts"
        >>>
        deep isText
    mapM_ putStrLn res

config =
    [ withParseHTML                 no
    , withWarnings                  yes
    , withInputEncoding             utf8
    , withOutputEncoding            utf8
    , withValidate                  yes
    ]
------------------------------------------------------

The file 'test_data-small.xml' contains the following data:

------------------------------------------------------
<?xml version='1.0' encoding='UTF-8' ?>

<contacts>

<person>
    <name>
        <firstname>Max</firstname>
        <lastname>M?ller</lastname>
    </name>
</person>

</contacts>
------------------------------------------------------

Note the umlaut in the lastname!

If I run the program, I get the following error:

------------------------------------------------------
error: UTF-8 encoding error at input position 127: ValueOutOfBounds
------------------------------------------------------

Any help is appreciated. Thanks.

--
Greetings
Elias



Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Derek McLoughlin
I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and it worked perfectly.

Are you sure that the XML file is actually saved with UTF-8 encoding?
Can you attach it?

On 27 September 2014 16:57, Elias Diem <lists at webconect.ch> wrote:

> Hi guys
>
> I have got the following haskell program:
>
> ------------------------------------------------------
> import Text.XML.HXT.Core
>
> main = do
>     xml <- readFile "test_data-small.xml"
>     let doc = readString config xml
>     res <- runX . xshow $
>         doc
>         >>>
>         getChildren >>> isElem >>> hasName "contacts"
>         >>>
>         deep isText
>     mapM_ putStrLn res
>
> config =
>     [ withParseHTML                 no
>     , withWarnings                  yes
>     , withInputEncoding             utf8
>     , withOutputEncoding            utf8
>     , withValidate                  yes
>     ]
> ------------------------------------------------------
>
> The file 'test_data-small.xml' contains the following data:
>
> ------------------------------------------------------
> <?xml version='1.0' encoding='UTF-8' ?>
>
> <contacts>
>
> <person>
>     <name>
>         <firstname>Max</firstname>
>         <lastname>M?ller</lastname>
>     </name>
> </person>
>
> </contacts>
> ------------------------------------------------------
>
> Note the umlaut in the lastname!
>
> If I run the program, I get the following error:
>
> ------------------------------------------------------
> error: UTF-8 encoding error at input position 127: ValueOutOfBounds
> ------------------------------------------------------
>
> Any help is appreciated. Thanks.
>
> --
> Greetings
> Elias
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners

Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
Hi Derek

On 2014-09-27,  Derek McLoughlin wrote:

> I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and
> it worked perfectly.

Good. Thanks.

> Are you sure that the XML file is actually saved with
> UTF-8 encoding?

I *think* so. Vim tells me that it's UTF-8. I will double
check.

> Can you attach it?

Here it is.

--
Greetings
Elias


-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_data-small.xml
Type: application/xml
Size: 181 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140927/81b50920/attachment.xml>

Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
On 2014-09-27,  Elias Diem wrote:

> I *think* so. Vim tells me that it's UTF-8. I will double
> check.

I just double checked. I'm 99% sure now that it is indeed
UTF-8.

--
Greetings
Elias



Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
In reply to this post by Derek McLoughlin
On 2014-09-27,  Derek McLoughlin wrote:

> I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and
> it worked perfectly.

My version of HXT is 9.2.2.

I run Debian GNU/Linux stable.

--
Greetings
Elias



Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Derek McLoughlin
That file ran fine for me.

I also tested it on a Cloud9 installation with GHC 7.6.3 and HXT 9.3
and it ran fine.

Also Ubuntu 14.04, GHC 7.6.3 and HXT 9.3 worked fine.

What's your default locale in Debian?

On my Mac and test Ubuntu box, it's:
LANG="en_IE.UTF-8"
LC_COLLATE="en_IE.UTF-8"
LC_CTYPE="en_IE.UTF-8"
...
all values = "C.UTF-8"

On my Cloud9 instance:

LANG=C
LANGUAGE=
LC_CTYPE="C.UTF-8"
...
all values = "C.UTF-8"

On 27 September 2014 18:53, Elias Diem <lists at webconect.ch> wrote:

> On 2014-09-27,  Derek McLoughlin wrote:
>
>> I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and
>> it worked perfectly.
>
> My version of HXT is 9.2.2.
>
> I run Debian GNU/Linux stable.
>
> --
> Greetings
> Elias
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners

Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
Hi Derek

Thanks for your help so far.

On 2014-09-27,  Derek McLoughlin wrote:

> That file ran fine for me.

Ok.

> I also tested it on a Cloud9 installation with GHC 7.6.3 and HXT 9.3
> and it ran fine.
>
> Also Ubuntu 14.04, GHC 7.6.3 and HXT 9.3 worked fine.

I will test it later this day on another computer as well.

> What's your default locale in Debian?
>
> On my Mac and test Ubuntu box, it's:
> LANG="en_IE.UTF-8"
> LC_COLLATE="en_IE.UTF-8"
> LC_CTYPE="en_IE.UTF-8"
> ...
> all values = "C.UTF-8"
>
> On my Cloud9 instance:
>
> LANG=C
> LANGUAGE=
> LC_CTYPE="C.UTF-8"
> ...
> all values = "C.UTF-8"

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_COLLATE=
LC_CTYPE=

I haven't got any environment variables starting with LC
defined.

--
Greetings
Elias



Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
On 2014-09-28,  Elias Diem wrote:

> I will test it later this day on another computer as well.

I just tested it on another Linux box. And it works!! What
could be the problem?

I noticed that on the other box I use HXT 9.3.1.1. Maybe
that is solving the problem.

--
Greetings
Elias



Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

David McBride
I've had issues like this before where it had to do with the locale
settings on my machine at the time.  It is subtle and annoying but it will
cause various haskell functions that read, like hGetContents to flip out if
they see a character that is not readable by the locale you have set.  It
has to be something to do with that.

On Sun, Sep 28, 2014 at 8:11 AM, Elias Diem <lists at webconect.ch> wrote:

> On 2014-09-28,  Elias Diem wrote:
>
> > I will test it later this day on another computer as well.
>
> I just tested it on another Linux box. And it works!! What
> could be the problem?
>
> I noticed that on the other box I use HXT 9.3.1.1. Maybe
> that is solving the problem.
>
> --
> Greetings
> Elias
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140928/640d2ee6/attachment.html>

Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Chaddaï Fouché
In reply to this post by Elias Diem
On Sun, Sep 28, 2014 at 2:11 PM, Elias Diem <lists at webconect.ch> wrote:

> On 2014-09-28,  Elias Diem wrote:
>
> > I will test it later this day on another computer as well.
>
> I just tested it on another Linux box. And it works!! What
> could be the problem?
>
>
readString is documented as not doing any decoding, so you're dependent on
your readFile doing it right for you, but that depends on your locale !
You could set your IO system input encoding yourself to avoid the problem
but it seems simpler to use "readDocument" provided by Hxt instead since
that'll read the file with your specified input encoding.

--
Jeda?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140929/95c6265a/attachment.html>

Reply | Threaded
Open this post in threaded view
|

HXT: encoding problem

Elias Diem
Hi Jeda?

On 2014-09-29,  Chadda? Fouch? wrote:

> readString is documented as not doing any decoding, so you're dependent on
> your readFile doing it right for you, but that depends on your locale !
> You could set your IO system input encoding yourself to avoid the problem
> but it seems simpler to use "readDocument" provided by Hxt instead since
> that'll read the file with your specified input encoding.

I use readDocument now as sugested and it works. Thanks for
the explanation.

Thanks to the others too!

--
Greetings
Elias