Quantcast

RE: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

Simon Marlow
This sounds like a good idea to me.

As far as possible, we should keep the platform-dependence restricted to
the implementation of one module (System.Posix.Internals will do, even
though this isn't really POSIX any more).  So System.Posix.Internals
exports the CFilePath/CFileOffset types, and the foreign functions that
operate on them.

Alternatively (and perhaps this is better), we could hide the difference
even further, and provide functions like

  rmDir :: FilePath -> IO CInt

in System.Posix.Internals.  Similarly for functions that operate on
COff, they would take/return Integer (eg. we already have
System.Posix.fdFileSize).

As regards whether to use feature tests or just #ifdef mingw32_HOST_OS,
in general feature tests are the right thing, but sometimes it doesn't
buy you very much when there is (and always will be) only one platform
that has some particular quirk.  Writing a bunch of autoconf code that
would, if we're lucky, handle properly the case when some future version
of Windows removes the quirk, is not a good use of developer time.
Furthermore, Windows hardly ever changes APIs, they just add new ones.
So I don't see occasional use of #ifdef mingw32_HOST_OS as a big deal.
It's more important to organise the codebase and make sure all the
#ifdefs are behind suitable abstractions.

Cheers,
        Simon

On 21 November 2005 12:01, Bulat Ziganshin wrote:

> Simon, what you will say about the following plan?
>
> ghc/win32 currently don't support operations with files with Unicode
> filenames, nor it can tell/seek in files for positions larger than 4
> GB. it is because Unix-compatible functions open/fstat/tell/... that
> is supported in Mingw32 works only with "char[]" for filenames and
> off_t (which is 32 bit) for file sizes/positions
>
> half year ago i discussed with Simon Marlow how support for unicode
> names and large files can be added to GHC. now i implemented my own
> library for such files, and got an idea how this can incorporated to
> GHC with minimal efforts:
>
> GHC currently uses CString type to represent C-land filenames and COff
> type to represent C-land fileseizes/positions. We need to
> systematically change these usages to CFilePath and CFileOffset,
> respectively, defined as follows:
>
> #ifdef mingw32_HOST_OS
> type CFilePath = LPCTSTR
> type CFileOffset = Int64
> withCFilePath = withTString
> peekCFilePath = peekTString
> #else
> type CFilePath = CString
> type CFileOffset = COff
> withCFilePath = withCString
> peekCFilePath = peekCString
> #endif
>
> and of course change using of withCString/peekCString, where it is
> applied to filenames, to withCFilePath/peekCFilePath (this will touch
> modules System.Posix.Internals, System.Directory, GHC.Handle)
>
> the last change needed is to conditionally define all "c_*" functions
> in System.Posix.Internals, whose types contain references to filenames
> or offsets:
>
> #ifdef mingw32_HOST_OS
> foreign import ccall unsafe "HsBase.h _wrmdir"
>    c_rmdir :: CFilePath -> IO CInt
> ....
> #else
> foreign import ccall unsafe "HsBase.h rmdir"
>    c_rmdir :: CFilePath -> IO CInt
> ....
> #endif
>
> (note that actual C function used is _wrmdir for Windows and rmdir for
> Unix). of course, all such functions defined in HsBase.h, also need to
> be defined conditionally, like:
>
> #ifdef mingw32_HOST_OS
> INLINE time_t __hscore_st_mtime ( struct _stati64* st ) { return
> st->st_mtime; } #else
> INLINE time_t __hscore_st_mtime ( struct stat* st ) { return
> st->st_mtime; } #endif
>
> That's all! of course, this will broke compatibility with current
> programs which directly uses these c_* functions (c_open, c_lseek,
> c_stat and
> so on). this may be issue for some libs. are someone really use these
> functions??? of course, we can go in another, fully
> backward-compatible way, by adding some "f_*" functions and changing
> high-level modules to work with these functions

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re[2]: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

Bulat Ziganshin
Hello Simon,

Wednesday, November 23, 2005, 2:22:02 PM, you wrote:

SM> This sounds like a good idea to me.

SM> As far as possible, we should keep the platform-dependence restricted to
SM> the implementation of one module (System.Posix.Internals will do, even
SM> though this isn't really POSIX any more).  So System.Posix.Internals
SM> exports the CFilePath/CFileOffset types, and the foreign functions that
SM> operate on them.

SM> Alternatively (and perhaps this is better), we could hide the difference
SM> even further, and provide functions like

SM>   rmDir :: FilePath -> IO CInt

SM> in System.Posix.Internals.  Similarly for functions that operate on
SM> COff, they would take/return Integer (eg. we already have
SM> System.Posix.fdFileSize).

well... but not well :)  let's consider function c_open for more
informative example. between functions c_open and openFile there is
several levels of "translation":

1) convert C types to Haskell types
2) check for errno and raise exception on error
3) convert interfaces (translate IOMode to CMode in this example)
4) convert file descriptors to Handles

you suggestion is to build middle-level library whose functions lie
between step 1 and 2 in this scheme:

c_open :: CFilePath -> CInt -> CMode -> IO CInt
1) convert C types to Haskell types
open :: String -> Int -> CMode -> IO Int
2) check for errno
3) convert interfaces
4) convert file descriptors to Handles

This have one obvious benefit - these functions will look very like to
its C counterparts. but on the other side, resulting functions will
not belong to C, nor to Haskell world - they will use Haskell types
but C-specific error signalling

moreover, adding such middle-level functions will not help making
implementation simpler - all differences between platforms are already
covered by definitions of CFilePath/CFileOffset/withCFilePath/peekCFilePath


but i propose to make these middle-level functions after stage 2 or
even 3 in this scheme - so that they will be fully in Haskell world,
only work with file descriptors instead of Handles. for example:

lseek :: Integral int => FD -> SeekMode -> int -> IO ()
lseek h direction offset = do
  let   whence :: CInt
        whence = case mode of
                   AbsoluteSeek -> sEEK_SET
                   RelativeSeek -> sEEK_CUR
                   SeekFromEnd  -> sEEK_END
  throwErrnoIfMinus1Retry_ "lseek"
    $ c_lseek (fromIntegral h) (fromIntegral offset) direction


profits:

1) current GHC.Handle code is monolithic, it performs all these 4
steps of translation in one function. this change will simplify this
module and concenrate it on solving only one, most complex, task -
implementing operations on Handles via operations on FDs

2) part of code in GHC.Handle, what is not really GHC-specific, will
be moved to standard hierarchical libraries, where it will become
ready to use by other Haskell implementations

3) alternative Handle implementations can use these middle-level
functions and not reinvent the wheel. just for example - in
http://haskell.org/~simonmar/new-io.tar.gz openFile code is mostly
copied from existing GHC.Handle

4) we will get full-fledged FD library on GHC, Hugs and NHC for free

5) if this FD library will have Handle-like interface, it can be
used as "poor men's" drop-in replacement of Handle library in
situations where we don't need its buffering and other advanced
features


so, as first step i propose to move middle-level code from GHC.Handle
to Posix.Internals, join FD type definitions, replace CString with
CFilePath where appropriate, and so on. and only after this - make
changes specific for windows. i can do it all. what you will say?


>> That's all! of course, this will broke compatibility with current
>> programs which directly uses these c_* functions (c_open, c_lseek,
>> c_stat and
>> so on). this may be issue for some libs. are someone really use these
>> functions??? of course, we can go in another, fully
>> backward-compatible way, by adding some "f_*" functions and changing
>> high-level modules to work with these functions

if my changes will be committed only to GHC 6.6 (HEAD) branch, the
problem that types of c_* functions is changed will not be a big
problem - you anyway change some interfaces between major releases.
but now i'm realized that Posix.Internals is part of libraries common
for several Haskell compilers. can such changes break their working?

moreover, i plan to move "throwErrnoIfMinus1RetryOnBlock" to
Foreign.C.Error, and sEEK_CUR/sEEK_SET/sEEK_END - to Posix.Internals.
can it be done?


SM> As regards whether to use feature tests or just #ifdef mingw32_HOST_OS,
SM> in general feature tests are the right thing, but sometimes it doesn't
SM> buy you very much when there is (and always will be) only one platform
SM> that has some particular quirk.  Writing a bunch of autoconf code that
SM> would, if we're lucky, handle properly the case when some future version
SM> of Windows removes the quirk, is not a good use of developer time.
SM> Furthermore, Windows hardly ever changes APIs, they just add new ones.
SM> So I don't see occasional use of #ifdef mingw32_HOST_OS as a big deal.
SM> It's more important to organise the codebase and make sure all the
SM> #ifdefs are behind suitable abstractions.

so i will write the following:

-- Support for Unicode filenames and files>4GB
#ifdef mingw32_HOST_OS

in ALL the places where this feature test must take place. it will
document the code and give ability to easily find/edit all these places
if this will be needed sometime in the future


can i also ask several questions about "new i/o" library? as i see,
this library solves 3 problems:

1) having several streams in 1 file. why it is better than using
just hDuplicate?

2) using different Char encodings on the streams. i think that it can
be better done by renaming current hGetChar/hPutChar to
hGetByte/hPutByte and adding different encodings just as different
"hGetByte->hGetChar" strategies. in this way memory buffers will always
hold untranslated chars and Handle structure will contain the
following fields:

data Handle__ = Handle__ { ...
  haPutChar :: (Word8 -> IO ()) -> Char -> IO (),
  haGetChar :: (IO Word8) -> IO Char ... }

these fields will be modified by hSetEncoding operation

3) using Handles to access memory/sockets/pipes and so on. can this be
solved in the same way as previous problem - by defining class Stream:

class Stream where
  sPutBuf, sGetBuf, sSeek, ....

and incorporating in Handle instance of these class instead of haFD:

data Handle__ = Handle__ {
      haStream        :: forall s . Stream s => s

?

--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re[3]: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

Bulat Ziganshin
Hello Bulat,

Thursday, November 24, 2005, 4:17:24 AM, you wrote:

BZ> but i propose to make these middle-level functions after stage 2 or
BZ> even 3 in this scheme - so that they will be fully in Haskell world,
BZ> only work with file descriptors instead of Handles. for example:

"it's better one time to see, than 100 times to hear", so i attached my
current Win32Files.hs to this letter. it's close to the FD library i
propose, only open/seek functions need to be rewritten. of course, i
will adapt this code to programming style used in the hierrachical
libraries

by implementing all the things i wrote in previous letter we will get
i/o library, defined as stack of APIs:

1) c_* functions
2) FD API (and other Streams - memory buffer, String, socket, pipe,
channel, mvar and so on)
3) Stream buffering API (Handle)
4) Char encoding for Handles API

--
Best regards,
 Bulat                            mailto:[hidden email]
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Win32Files.hs (11K) Download Attachment
Loading...