ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Millikin
Both packages now have much-improved support for non-UTF8 paths on
POSIX systems. There are no significant changes to Windows support in
this release.

system-filepath 0.4.5:
Hackage: http://hackage.haskell.org/package/system-filepath-0.4.5
API reference: https://john-millikin.com/software/haskell-filesystem/reference/system-filepath/0.4.5/

system-fileio 0.3.4:
Hackage: http://hackage.haskell.org/package/system-fileio-0.3.4
API reference: https://john-millikin.com/software/haskell-filesystem/reference/system-fileio/0.3.4/Filesystem/

-----

In GHC  7.2 and later, file path handling in the platform libraries
was changed to treat all paths as text (encoded according to locale).
This does not work well on POSIX systems, because POSIX paths are byte
sequences. There is no guarantee that any particular path will be
valid in the user's locale encoding.

system-filepath and system-fileio were modified to partially support
this new behavior, but because the underlying libraries were unable to
represent certain paths, they were still "broken" when built with GHC
7.2+. The changes in this release mean that they are now fully
compatible (to the best of my knowledge) with GHC 7.2 and 7.4.

Important changes:

* system-filepath has been converted from GHC's escaping rules to its
own, more compatible rules. This lets it support file paths that
cannot be represented in GHC 7.2's escape format.

* The POSIX layer of system-fileio has been completely rewritten to
use the FFI, rather than System.Directory. This allows it to work with
arbitrary POSIX paths, including those that GHC itself cannot handle.
The Windows layer still uses System.Directory, since it seems to work
properly.

* The POSIX implementation of createTree will no longer recurse into
directory symlinks that it does not have permission to remove. This is
a change in behavior from the directory package's implementation. See
http://www.haskell.org/pipermail/haskell-cafe/2012-January/098911.html
for details and the reasoning behind the change. Since Windows does
not support symlinks, I have not modified the Windows implementation
(which uses removeDirectoryRecursive).

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Joey Hess
John Millikin wrote:
> In GHC  7.2 and later, file path handling in the platform libraries
> was changed to treat all paths as text (encoded according to locale).
> This does not work well on POSIX systems, because POSIX paths are byte
> sequences. There is no guarantee that any particular path will be
> valid in the user's locale encoding.

I've been dealing with this change too, but my current understanding
is that GHC's handling of encoding for FilePath is documented to allow
"arbitrary undecodable bytes to be round-tripped through it".

As long as FilePaths are read using this file system encoding, any
FilePath should be usable even if it does not match the user's encoding.

For FFI, anything that deals with a FilePath should use this
withFilePath, which GHC contains but doesn't export(?), rather than the
old withCString or withCAString:

import GHC.IO.Encoding (getFileSystemEncoding)
import GHC.Foreign as GHC

withFilePath :: FilePath -> (CString -> IO a) -> IO a
withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f

Code that reads or writes a FilePath to a Handle (including even to
stdout!) must take care to set the right encoding too:

fileEncoding :: Handle -> IO ()
fileEncoding h = hSetEncoding h =<< getFileSystemEncoding

> * system-filepath has been converted from GHC's escaping rules to its
> own, more compatible rules. This lets it support file paths that
> cannot be represented in GHC 7.2's escape format.

I'm dobutful about adding yet another encoding to the mix. Things are
complicated enough already! And in my tests, GHC 7.4's FilePath encoding
does allow arbitrary bytes in FilePaths.

BTW, GHC now also has RawFilePath. Parts of System.Directory could be
usefully written to support that data type too. For example, the parent
directory can be determined. Other things are more difficult to do with
RawFilepath.

--
see shy jo

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (845 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Millikin
On Sun, Feb 5, 2012 at 18:49, Joey Hess <[hidden email]> wrote:

> John Millikin wrote:
>> In GHC  7.2 and later, file path handling in the platform libraries
>> was changed to treat all paths as text (encoded according to locale).
>> This does not work well on POSIX systems, because POSIX paths are byte
>> sequences. There is no guarantee that any particular path will be
>> valid in the user's locale encoding.
>
> I've been dealing with this change too, but my current understanding
> is that GHC's handling of encoding for FilePath is documented to allow
> "arbitrary undecodable bytes to be round-tripped through it".
>
> As long as FilePaths are read using this file system encoding, any
> FilePath should be usable even if it does not match the user's encoding.

That was my understanding also, then QuickCheck found a
counter-example. It turns out that there are cases where a valid path
cannot be roundtripped in the GHC 7.2 encoding.

--------------------------------------------------------------------------
$ ~/ghc-7.0.4/bin/ghci
Prelude> writeFile ".txt" "test"
Prelude> readFile ".txt"
"test"
Prelude>

$ ~/ghc-7.2.1/bin/ghci
Prelude> import System.Directory
Prelude System.Directory> getDirectoryContents "."
["\61347.txt","\61347.txt","..","."]
Prelude System.Directory> readFile "\61347.txt"
*** Exception: .txt: openFile: does not exist (No such file or directory)
Prelude System.Directory>
--------------------------------------------------------------------------

The issue is that  [238,189,178] decodes to 0xEF72, which is within
the 0xEF00-0xEFFF range that GHC uses to represent un-decodable bytes.

> For FFI, anything that deals with a FilePath should use this
> withFilePath, which GHC contains but doesn't export(?), rather than the
> old withCString or withCAString:
>
> import GHC.IO.Encoding (getFileSystemEncoding)
> import GHC.Foreign as GHC
>
> withFilePath :: FilePath -> (CString -> IO a) -> IO a
> withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f

If code uses either withFilePort or withCString, then the filenames
written will depend on the user's locale. This is wrong. Filenames are
either non-encoded text strings (Windows), UTF8 (OSX), or arbitrary
bytes (non-OSX POSIX). They must not change depending on the locale.

> Code that reads or writes a FilePath to a Handle (including even to
> stdout!) must take care to set the right encoding too:
>
> fileEncoding :: Handle -> IO ()
> fileEncoding h = hSetEncoding h =<< getFileSystemEncoding

This is also wrong. A "file path" cannot be written to a handle with
any hope of correct behavior. If it's to be displayed to the user, a
path should be converted to text first, then displayed.

>> * system-filepath has been converted from GHC's escaping rules to its
>> own, more compatible rules. This lets it support file paths that
>> cannot be represented in GHC 7.2's escape format.
>
> I'm dobutful about adding yet another encoding to the mix. Things are
> complicated enough already! And in my tests, GHC 7.4's FilePath encoding
> does allow arbitrary bytes in FilePaths.

Unlike the GHC encoding, this encoding is entirely internal, and
should not change the API's behavior.

> BTW, GHC now also has RawFilePath. Parts of System.Directory could be
> usefully written to support that data type too. For example, the parent
> directory can be determined. Other things are more difficult to do with
> RawFilepath.

This is new in 7.4, and won't be backported, right? I tried compiling
the new "unix" package in 7.2 to get proper file path support, but it
failed with an error about some new language extension.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Millikin
On Sun, Feb 5, 2012 at 19:17, John Millikin <[hidden email]> wrote:
> --------------------------------------------------------------------------
> $ ~/ghc-7.0.4/bin/ghci
> Prelude> writeFile ".txt" "test"
> Prelude> readFile ".txt"
> "test"
> Prelude>

Sorry, that got a bit mangled in the email. Corrected version:

--------------------------------------------------------------------------
$ ~/ghc-7.0.4/bin/ghci
Prelude> writeFile "\xA3.txt" "test"
Prelude> readFile "\xA3.txt"
"test"
Prelude> writeFile "\xEE\xBE\xA3.txt" "test 2"
Prelude> readFile "\xEE\xBE\xA3.txt"
"test 2"
--------------------------------------------------------------------------

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Joey Hess
In reply to this post by John Millikin
John Millikin wrote:
> That was my understanding also, then QuickCheck found a
> counter-example. It turns out that there are cases where a valid path
> cannot be roundtripped in the GHC 7.2 encoding.

> The issue is that  [238,189,178] decodes to 0xEF72, which is within
> the 0xEF00-0xEFFF range that GHC uses to represent un-decodable bytes.

How did you deal with this in system-filepath?

While no code points in the Supplementary Special-purpose Plane are currently
assigned (http://www.unicode.org/roadmaps/ssp/), it is worrying that it's used,
especially if filenames in a non-unicode encoding could be interpreted as
containing characters really within this plane. I wonder why maxBound :: Char
was not increased, and the addtional space after `\1114111' used for the
un-decodable bytes?

> > For FFI, anything that deals with a FilePath should use this
> > withFilePath, which GHC contains but doesn't export(?), rather than the
> > old withCString or withCAString:
> >
> > import GHC.IO.Encoding (getFileSystemEncoding)
> > import GHC.Foreign as GHC
> >
> > withFilePath :: FilePath -> (CString -> IO a) -> IO a
> > withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f
>
> If code uses either withFilePort or withCString, then the filenames
                      withFilePath?
> written will depend on the user's locale. This is wrong. Filenames are
> either non-encoded text strings (Windows), UTF8 (OSX), or arbitrary
> bytes (non-OSX POSIX). They must not change depending on the locale.

This is exactly how GHC 7.4 handles them. For example:

openDirStream :: FilePath -> IO DirStream
openDirStream name =
  withFilePath name $ \s -> do
    dirp <- throwErrnoPathIfNullRetry "openDirStream" name $ c_opendir s
    return (DirStream dirp)

removeLink :: FilePath -> IO ()
removeLink name =
  withFilePath name $ \s ->
  throwErrnoPathIfMinus1_ "removeLink" name (c_unlink s)

I do not see any locale-dependant behavior in the filename bytes read/written.

> > Code that reads or writes a FilePath to a Handle (including even to
> > stdout!) must take care to set the right encoding too:
> >
> > fileEncoding :: Handle -> IO ()
> > fileEncoding h = hSetEncoding h =<< getFileSystemEncoding
>
> This is also wrong. A "file path" cannot be written to a handle with
> any hope of correct behavior. If it's to be displayed to the user, a
> path should be converted to text first, then displayed.

Sure it can. See find(1). Its output can be read as FilePaths once the
Handle is set up as above.

If you prefer your program not crash with an encoding error when an
arbitrary FilePath is putStr, but instead perhaps output bytes that are
not valid in the current encoding, that's also a valid choice. You might
be writing a program, like find, that again needs to output any possible
FilePath including badly encoded ones.

Filesystem.Path.CurrentOS.toText is a nice option if you want validly
encoded output though. Thanks for that!

> This is new in 7.4, and won't be backported, right? I tried compiling
> the new "unix" package in 7.2 to get proper file path support, but it
> failed with an error about some new language extension.

The RawFilePath is just a ByteString, so your existing converters for that
in system-filepath might work.

--
see shy jo

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (845 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Millikin
On Mon, Feb 6, 2012 at 10:05, Joey Hess <[hidden email]> wrote:
> John Millikin wrote:
>> That was my understanding also, then QuickCheck found a
>> counter-example. It turns out that there are cases where a valid path
>> cannot be roundtripped in the GHC 7.2 encoding.
>
>> The issue is that  [238,189,178] decodes to 0xEF72, which is within
>> the 0xEF00-0xEFFF range that GHC uses to represent un-decodable bytes.
>
> How did you deal with this in system-filepath?

I used 0xEF00 as an escape character, to mean the following char
should be interpreted as a literal byte.

A user pointed out that there is a problem with this solution also --
a path containing actual U+EF00 will be considered "invalid encoding".
I'm going to change things over to use the Python 3 solution -- they
use part of the UTF16 surrogate pair range, so it's impossible for a
valid path to contain their stand-in characters.

Another user says that GHC 7.4 also changed its escape range to match
Python 3, so it seems to be a pseudo-standard now. That's really good.
I'm going to add a 'posix_ghc704' rule to system-filepath, which
should mean that only users running GHC 7.2 will have to worry about
escape chars.

Unfortunately, the "text" package refuses to store codepoints in that
range (it replaces them with a placeholder), so I have to switch
things over to use [Char].

(Yak sighted! Prepare lather!)

> While no code points in the Supplementary Special-purpose Plane are currently
> assigned (http://www.unicode.org/roadmaps/ssp/), it is worrying that it's used,
> especially if filenames in a non-unicode encoding could be interpreted as
> containing characters really within this plane. I wonder why maxBound :: Char
> was not increased, and the addtional space after `\1114111' used for the
> un-decodable bytes?

There's probably a lot of code out there that assumes (maxBound ::
Char) is also the maximum Unicode code point. It would be difficult to
update, particularly when dealing with bindings to foreign libraries
(like the "text-icu" package).

Both Python 3 and GHC 7.4 are using codepoints in the UTF16 surrogate
pair range for this, and that seems like a pretty clean solution.

>> > For FFI, anything that deals with a FilePath should use this
>> > withFilePath, which GHC contains but doesn't export(?), rather than the
>> > old withCString or withCAString:
>> >
>> > import GHC.IO.Encoding (getFileSystemEncoding)
>> > import GHC.Foreign as GHC
>> >
>> > withFilePath :: FilePath -> (CString -> IO a) -> IO a
>> > withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f
>>
>> If code uses either withFilePort or withCString, then the filenames
>                      withFilePath?
>> written will depend on the user's locale. This is wrong. Filenames are
>> either non-encoded text strings (Windows), UTF8 (OSX), or arbitrary
>> bytes (non-OSX POSIX). They must not change depending on the locale.
>
> This is exactly how GHC 7.4 handles them. For example:
>
> openDirStream :: FilePath -> IO DirStream
> openDirStream name =
>  withFilePath name $ \s -> do
>    dirp <- throwErrnoPathIfNullRetry "openDirStream" name $ c_opendir s
>    return (DirStream dirp)
>
> removeLink :: FilePath -> IO ()
> removeLink name =
>  withFilePath name $ \s ->
>  throwErrnoPathIfMinus1_ "removeLink" name (c_unlink s)
>
> I do not see any locale-dependant behavior in the filename bytes read/written.

Perhaps I'm misunderstanding, but the definition of 'withFilePath' you
provided is definitely locale-dependent. Unless getFileSystemEncoding
is constant?

>> > Code that reads or writes a FilePath to a Handle (including even to
>> > stdout!) must take care to set the right encoding too:
>> >
>> > fileEncoding :: Handle -> IO ()
>> > fileEncoding h = hSetEncoding h =<< getFileSystemEncoding
>>
>> This is also wrong. A "file path" cannot be written to a handle with
>> any hope of correct behavior. If it's to be displayed to the user, a
>> path should be converted to text first, then displayed.
>
> Sure it can. See find(1). Its output can be read as FilePaths once the
> Handle is set up as above.
>
> If you prefer your program not crash with an encoding error when an
> arbitrary FilePath is putStr, but instead perhaps output bytes that are
> not valid in the current encoding, that's also a valid choice. You might
> be writing a program, like find, that again needs to output any possible
> FilePath including badly encoded ones.

A program like find(1) has two use cases:

1. Display paths to the user, as text.

2. Provide paths to another program, in the operating system's file path format.

These two goals are in conflict. It is not possible to implement a
find(1) that performs both correctly in all locales.

The best solution is to choose #2, and always write in the OS format,
and hope the user's shell+terminal are capable of rendering it to a
reasonable-looking path.

> Filesystem.Path.CurrentOS.toText is a nice option if you want validly
> encoded output though. Thanks for that!

Ah, that's not what toText is for. toText provides a human-readable
representation of the path. It's used for things like file managers,
where you need to show the user a label which approximates the
underlying path. There's no guarantee that the output of toText can be
converted back to the original path, especially if it returns a Left.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Ian Lynagh
In reply to this post by John Millikin
On Sun, Feb 05, 2012 at 07:17:32PM -0800, John Millikin wrote:
>
> That was my understanding also, then QuickCheck found a
> counter-example. It turns out that there are cases where a valid path
> cannot be roundtripped in the GHC 7.2 encoding.

This is fixed in GHC 7.4.1.


Thanks
Ian


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Joey Hess
In reply to this post by John Millikin
John Millikin wrote:
> Perhaps I'm misunderstanding, but the definition of 'withFilePath' you
> provided is definitely locale-dependent. Unless getFileSystemEncoding
> is constant?

I think/hope it's locale dependent, but undecodable bytes are remapped,
so as long as the system's locale doesn't change, reading a FilePath
with the encoding and then writing it back out should always reproduce
the same bytes.

> > Filesystem.Path.CurrentOS.toText is a nice option if you want validly
> > encoded output though. Thanks for that!
>
> Ah, that's not what toText is for. toText provides a human-readable
> representation of the path. It's used for things like file managers,
> where you need to show the user a label which approximates the
> underlying path. There's no guarantee that the output of toText can be
> converted back to the original path, especially if it returns a Left.

Yes, that's what I meant. :)

--
see shy jo

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (845 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Simon Marlow-7
In reply to this post by Ian Lynagh
On 06/02/2012 20:32, Ian Lynagh wrote:
> On Sun, Feb 05, 2012 at 07:17:32PM -0800, John Millikin wrote:
>>
>> That was my understanding also, then QuickCheck found a
>> counter-example. It turns out that there are cases where a valid path
>> cannot be roundtripped in the GHC 7.2 encoding.
>
> This is fixed in GHC 7.4.1.

I think we forgot to mention it in the release notes.  Rountripping of
FilePath is now fully supported.  The commit in question is this:

commit 7e59b6d50ec4a4400e8730bfd8cfc471c1873702
Author: Max Bolingbroke <[hidden email]>
Date:   Fri Nov 18 17:45:34 2011 +0000

     Go back to using private-use characters in roundtripping


Which was the result of a long discussion on the glasgow-haskell-users
mailing list:

 
http://www.haskell.org/pipermail/glasgow-haskell-users/2011-November/021115.html


Separately the unix package added support for undecoded FilePaths
(RawFilePath), but unfortunately at the same time we started using a new
extension in GHC 7.4.1 (CApiFFI), which we decided not to document
because it was still experimental:

   http://hackage.haskell.org/trac/ghc/ticket/2979

In retrospect we should have documented this.  It's not like we don't
normally dump a load of experimental features on our users and then
change them later :-)

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Meacham
On Tue, Feb 7, 2012 at 4:24 AM, Simon Marlow <[hidden email]> wrote:
> Separately the unix package added support for undecoded FilePaths
> (RawFilePath), but unfortunately at the same time we started using a new
> extension in GHC 7.4.1 (CApiFFI), which we decided not to document because
> it was still experimental:

Hi, from my reading, it looks like 'capi' means from a logical perspective,

"Don't assume the object is addressible, but rather that the standard c syntax
for calling this routine will expand into correct code when compiled with the
stated headers"

So, it may be implemented by say creating a stub .c file that includes the
 headers and creates a wrapper around each one or when compiling via C,
actually including the given headers and the function calls in the code.

I ask because jhc needs such a feature (very hacky method used now,
the rts knows some problematic functions and includes hacky wrappers
and #defines.) and I'll make it behave just like the ghc one when possible.

   John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Simon Marlow-7
On 08/02/2012 02:26, John Meacham wrote:

> On Tue, Feb 7, 2012 at 4:24 AM, Simon Marlow<[hidden email]>  wrote:
>> Separately the unix package added support for undecoded FilePaths
>> (RawFilePath), but unfortunately at the same time we started using a new
>> extension in GHC 7.4.1 (CApiFFI), which we decided not to document because
>> it was still experimental:
>
> Hi, from my reading, it looks like 'capi' means from a logical perspective,
>
> "Don't assume the object is addressible, but rather that the standard c syntax
> for calling this routine will expand into correct code when compiled with the
> stated headers"
>
> So, it may be implemented by say creating a stub .c file that includes the
>   headers and creates a wrapper around each one or when compiling via C,
> actually including the given headers and the function calls in the code.

Yes, that's exactly it.  In GHC we create a stub (even when compiling
via C, for simplicity of implementation).

Cheers,
        Simon


> I ask because jhc needs such a feature (very hacky method used now,
> the rts knows some problematic functions and includes hacky wrappers
> and #defines.) and I'll make it behave just like the ghc one when possible.
>
>     John


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Ian Lynagh
In reply to this post by John Meacham
On Tue, Feb 07, 2012 at 06:26:48PM -0800, John Meacham wrote:

>
> Hi, from my reading, it looks like 'capi' means from a logical perspective,
>
> "Don't assume the object is addressible, but rather that the standard c syntax
> for calling this routine will expand into correct code when compiled with the
> stated headers"
>
> So, it may be implemented by say creating a stub .c file that includes the
>  headers and creates a wrapper around each one or when compiling via C,
> actually including the given headers and the function calls in the code.

That sounds right. It basically means you don't have to write the C
stubs yourself, which is nice because (a) doing so is a pain, and (b)
when the foreign import is inside 2 or 3 CPP conditionals it's even more
of a pain to replicate them correctly in the C stub.

Unfortunately, there are cases where C doesn't get all the type
information it needs, e.g.:
    http://hackage.haskell.org/trac/ghc/ticket/2979#comment:14
but I'm not sure what the best fix is.

> I ask because jhc needs such a feature (very hacky method used now,
> the rts knows some problematic functions and includes hacky wrappers
> and #defines.) and I'll make it behave just like the ghc one when possible.

Great!


Thanks
Ian


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Meacham
On Wed, Feb 8, 2012 at 10:56 AM, Ian Lynagh <[hidden email]> wrote:
> That sounds right. It basically means you don't have to write the C
> stubs yourself, which is nice because (a) doing so is a pain, and (b)
> when the foreign import is inside 2 or 3 CPP conditionals it's even more
> of a pain to replicate them correctly in the C stub.
>
> Unfortunately, there are cases where C doesn't get all the type
> information it needs, e.g.:
>    http://hackage.haskell.org/trac/ghc/ticket/2979#comment:14
> but I'm not sure what the best fix is.

I believe jhc's algorithm works in this case. Certain type constructors have C
types associated with them, in particular, many newtypes have c types that are
different than their contents. So my routine that finds out whether an argument
is suitable for FFIing returns both a c type, and the underlying raw type (Int#
etc..) that the type maps to. So the algorithm checks if the current type
constructor has an associated C type, if it doesn't then it expands the newtype
one layer and trys again, however if it does have a c type, it still recurses
to get at the underlying raw type, but then replaces the c type with whatever
was attached to the newtype. In the case of 'Ptr a' it recursively runs the
algorithm on the argument to 'Ptr', then takes that c type and appends
a '*' to it.
If the argument to 'Ptr' is not an FFIable type, then it just returns
HsPtr as the C type.

Since CSigSet has "sigset_t" associated with it, 'Ptr CSigSet' ends up turning
into 'sigset_t *' in the generated code. (Ptr (Ptr CChar)) turns into char**
and so forth.

An interesting quirk of this scheme is that it faithfully translates the
perhaps unfortunate idiom of

newtype Foo_t = Foo_t (Ptr Foo_t)

into  foo_t************ (an infinite chain of pointers)

which is actually what the user specified. :) I added a check for recursive
newtypes that chops the recursion to catch this as people seem to utilize it.

>> I ask because jhc needs such a feature (very hacky method used now,
>> the rts knows some problematic functions and includes hacky wrappers
>> and #defines.) and I'll make it behave just like the ghc one when possible.
>
> Great!

It has now been implemented, shall be in jhc 0.8.1.

    John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Ian Lynagh
On Thu, Feb 09, 2012 at 04:52:16AM -0800, John Meacham wrote:
>
> Since CSigSet has "sigset_t" associated with it, 'Ptr CSigSet' ends up turning
> into 'sigset_t *' in the generated code. (Ptr (Ptr CChar)) turns into char**
> and so forth.

What does the syntax for associating sigset_t with CSigSet look like?


Thanks
Ian


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Meacham
On Thu, Feb 9, 2012 at 11:23 AM, Ian Lynagh <[hidden email]> wrote:
> On Thu, Feb 09, 2012 at 04:52:16AM -0800, John Meacham wrote:
>>
>> Since CSigSet has "sigset_t" associated with it, 'Ptr CSigSet' ends up turning
>> into 'sigset_t *' in the generated code. (Ptr (Ptr CChar)) turns into char**
>> and so forth.
>
> What does the syntax for associating sigset_t with CSigSet look like?

There currently isn't a user accessable once, but CSigSet is included in the
FFI spec so having the complier know about it isn't that bad. In fact, it is
how I interpreted the standard. Otherwise, why would CFile be specified if it
didn't expand 'Ptr CFile' properly. I just have a single list of associations
that is easy to update at the moment, but a user defineable way is something i
want in the future. My current syntax idea is.

data CFile = foreign "stdio.h FILE"

but it doesn't extend easily to 'newtype's
or maybe a {-# CTYPE "FILE" #-} pragma...

The 'Ptr' trick is useful for more than just pointers, I use the same thing to
support native complex numbers. I have

data Complex_ :: # -> #  -- type function of unboxed types to unboxed types.

then can do things like 'Complex_ Float64_' to get hardware supported complex
doubles. The expansion happens just like 'Ptr' except instead of postpending
'*' when it encounters _Complex, it prepends '_Complex ' (a C99 standard
keyword).

You can then import primitives like normal (for jhc)

foreign import primitive "Add" complexPlus ::
        Complex_ Float64_ -> Complex_ Float64_ -> Complex_ Float64_

and lift it into a data type and add instances for the standard numeric classes
if you wish. (I have macros that automate the somewhat repetitive instance
creation in lib/jhc/Jhc/Num.m4)

        John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Ian Lynagh
On Thu, Feb 09, 2012 at 11:40:28AM -0800, John Meacham wrote:

> On Thu, Feb 9, 2012 at 11:23 AM, Ian Lynagh <[hidden email]> wrote:
> > On Thu, Feb 09, 2012 at 04:52:16AM -0800, John Meacham wrote:
> >>
> >> Since CSigSet has "sigset_t" associated with it, 'Ptr CSigSet' ends up turning
> >> into 'sigset_t *' in the generated code. (Ptr (Ptr CChar)) turns into char**
> >> and so forth.
> >
> > What does the syntax for associating sigset_t with CSigSet look like?
>
> There currently isn't a user accessable once,
>
> My current syntax idea is.
>
> data CFile = foreign "stdio.h FILE"
>
> but it doesn't extend easily to 'newtype's
> or maybe a {-# CTYPE "FILE" #-} pragma...

I've now implemented this in GHC. For now, the syntax is:

type    {-# CTYPE "some C type" #-} Foo = ...
newtype {-# CTYPE "some C type" #-} Foo = ...
data    {-# CTYPE "some C type" #-} Foo = ...

The magic for (Ptr a) is built in to the compiler.


Thanks
Ian


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

John Meacham
On Thu, Feb 16, 2012 at 1:20 PM, Ian Lynagh <[hidden email]> wrote:
> I've now implemented this in GHC. For now, the syntax is:
>
> type    {-# CTYPE "some C type" #-} Foo = ...
> newtype {-# CTYPE "some C type" #-} Foo = ...
> data    {-# CTYPE "some C type" #-} Foo = ...
>
> The magic for (Ptr a) is built in to the compiler.

Heh. I just added it for jhc too with the exact same syntax. :)

the difference is that I do not allow them for 'type' declarations, as
dusugaring of types happens very early in compilation, and it feels sort
of wrong to give type synonyms meaning. like I'm breaking referential
transparency or something..

I also allow foreign header declarations just like with ccall.
data {-# CTYPE "stdio.h FILE" #-} CFile

will mean that 'stdio.h' needs to be included for FILE to be declared.

   John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe