Handling custom types in Takusen

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling custom types in Takusen

Salvatore Insalaco
Disclaimer: I don't know Takusen very well, and I'm talking of Sqlite backend.

I noticed that in Takusen there're just two instances to implement to
make any Haskell type db-serializable: DBBind / SqliteBind for
serialization and DBType for deserialization.

I wanted to implement blob serialization for PackedStrings, but I
noticed that both DBBind and DBType classes are not public.
Defining custom serialization (not just Show  / Read string
serialization) is very useful in applicative code, and not just in
Takusen backend code.

There's any way to do it without modifying Takusen Sqlite backend?

Salvatore Insalaco
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Handling custom types in Takusen

Bayley, Alistair-2
> I noticed that in Takusen there're just two instances to implement to
> make any Haskell type db-serializable: DBBind / SqliteBind for
> serialization and DBType for deserialization.
>
> I wanted to implement blob serialization for PackedStrings, but I
> noticed that both DBBind and DBType classes are not public.
> Defining custom serialization (not just Show  / Read string
> serialization) is very useful in applicative code, and not just in
> Takusen backend code.
>
> There's any way to do it without modifying Takusen Sqlite backend?
>
> Salvatore Insalaco

Sorry, not at present. It shouldn't be hard to modify the code though,
and PackedString support is something we should really be thinking about
anyway (as well as Blobs, which I think should have a different type).

Although DBType is a class in InternalEnumerator (so you'd think it
might be easy to extend with new types), in practice we do not expose
the internal Query and ColumnBuffer objects that you also need to create
new instances of the class, so you're stuck with what we provide. This
is intentional; you usually need to access the low-level buffers and
other resources to write an instance on DBType, and we don't want to
expose these details to users. By hiding this stuff, we can ensure that
resources are properly cleaned up when queries and commands complete.

So, please take this as an invitation to modity the Sqlite
implementation to handle PackedStrings (and Blobs, if you want to).

BTW, do you really need to marshall PackedStrings to blobs? The Sqlite
library uses CStrings, and I assume that CString to PackedString
marshaling is fairly efficient, so that would seem to be a better
choice. (I have no experience of PackedStrings, so there might be good
reasons to prefer blobs, and I'd like to know what they are.)

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Handling custom types in Takusen

Salvatore Insalaco
2007/7/27, Bayley, Alistair <[hidden email]>:
> BTW, do you really need to marshall PackedStrings to blobs? The Sqlite
> library uses CStrings, and I assume that CString to PackedString
> marshaling is fairly efficient, so that would seem to be a better
> choice. (I have no experience of PackedStrings, so there might be good
> reasons to prefer blobs, and I'd like to know what they are.)

The main reason is that I'm working on a Sqlite back-end for Darcs,
that will be used to store file contents.
I think to choose Takusen as back-end library mainly because it has
the blob functions of Sqlite already mapped.

In Darcs PackedStrings are used to store file contents in memory. I
don't think that it would be very efficient to store files of
megabytes in a text column (there could be encoding problems too).

Also, in Sqlite 3.4, they introduced new functions for incremental
reading / writing of Blobs. I could use them in the future.

Thank you a lot for helping! I'll surely send you the patches, even if
the PackedString support will be a little Darcs-specific (I don't
think that requiring it for compiling Takusen is a good idea).

Salvatore Insalaco
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Handling custom types in Takusen

Bayley, Alistair-2
> From: Salvatore Insalaco [mailto:[hidden email]]
>
> The main reason is that I'm working on a Sqlite back-end for Darcs,
> that will be used to store file contents.
> I think to choose Takusen as back-end library mainly because it has
> the blob functions of Sqlite already mapped.

Umm, yes... I wrote the blob FFI imports for Sqlite, as I planned to
prototype Takusen's blob support in Sqlite first, but never got any
further with it. It is completely untested.


> In Darcs PackedStrings are used to store file contents in memory. I
> don't think that it would be very efficient to store files of
> megabytes in a text column (there could be encoding problems too).
>
> Also, in Sqlite 3.4, they introduced new functions for incremental
> reading / writing of Blobs. I could use them in the future.

Seems reasonable. I recall Oleg saying something privately a while ago
about an API for large objects. He may have some ideas for this.

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Handling custom types in Takusen

Salvatore Insalaco
2007/7/27, Bayley, Alistair <[hidden email]>:
> > Also, in Sqlite 3.4, they introduced new functions for incremental
> > reading / writing of Blobs. I could use them in the future.
>
> Seems reasonable. I recall Oleg saying something privately a while ago
> about an API for large objects. He may have some ideas for this.

A good idea could be to do the marshalling / unmarshalling of Blob as
CStringLen. It can then be used to construct PackedStrings and
ByteStrings in O(1), or doing some custom computations with it. A
CStringLen, even if contains Ptr CChar, can easily be converted to any
pointer type.

Another idea could be using custom bind function for blobs, but the
first solution is surely easier.

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Handling custom types in Takusen

Bayley, Alistair-2
> From: Salvatore Insalaco [mailto:[hidden email]]
>
> 2007/7/27, Bayley, Alistair <[hidden email]>:
> > > Also, in Sqlite 3.4, they introduced new functions for incremental
> > > reading / writing of Blobs. I could use them in the future.
> >
> > Seems reasonable. I recall Oleg saying something privately
> a while ago
> > about an API for large objects. He may have some ideas for this.
>
> A good idea could be to do the marshalling / unmarshalling of Blob as
> CStringLen. It can then be used to construct PackedStrings and
> ByteStrings in O(1), or doing some custom computations with it. A
> CStringLen, even if contains Ptr CChar, can easily be converted to any
> pointer type.
>
> Another idea could be using custom bind function for blobs, but the
> first solution is surely easier.

It was my intention to do it the other way around: marshall blob to Ptr
(), and then you can cast this to a Ptr CChar. Obviously you'd need to
retain the size information, so a blob basically becomes a (Ptr (), Int)
pair, just like a CStringLen...

At least this way you've got a type which says explicitly "this thing is
a blob", and then if you know better, i.e. it's really a CString, you
can cast it.

Actually, it wouldn't cost much to have both marshalling functions in
the low-level API, even if they both call the same Sqlite blob retrieval
function.

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Handling custom types in Takusen

Salvatore Insalaco
2007/7/27, Bayley, Alistair <[hidden email]>:
> It was my intention to do it the other way around: marshall blob to Ptr
> (), and then you can cast this to a Ptr CChar. Obviously you'd need to
> retain the size information, so a blob basically becomes a (Ptr (), Int)
> pair, just like a CStringLen...
>
> At least this way you've got a type which says explicitly "this thing is
> a blob", and then if you know better, i.e. it's really a CString, you
> can cast it.

You're right, I'll work in this direction. Eventually I will add also
the CStringLen marshalling, as it is a pretty common case (BS, FPS,
PS) and it could be handy.

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Handling custom types in Takusen

Ben Franksen-2
In reply to this post by Salvatore Insalaco
Salvatore Insalaco wrote:
> I noticed that in Takusen there're just two instances to implement to
> make any Haskell type db-serializable: DBBind / SqliteBind for
> serialization and DBType for deserialization.

FWIW, I have two patches lying around (attached) that I wanted to send to
the Takusen maintainers anyway. They (the patches) implement (only)
instance DBType Data.ByteString for Oracle and Sqlite backends. They are
rudimentarily tested ("hey, seems to work!"), anyway a review might be in
order because I am not sure I understand the internals good enough -- for
all I know I might have introduced space leaks or whatnot.

Cheers
Ben
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

ByteStringSupport.darcspatch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen]

oleg-7
In reply to this post by Bayley, Alistair-2

I have been using Takusen with PostgreSQL to store and retrieve
hundreds of multi-megabyte binary objects. A client may request
literally hundred of such objects in one request; the Haskell
(FastCGI) application server will send these objects in one multi-part
message. The handling of the entire request is done in *constant* and
small memory, at low latency and at the rate that is limited by
client's network connection. The server handles hundreds of such
requests without allocating memory: the Haskell server uses only one
16KB buffer for all of its I/O. Incidentally, with the exception of
occasional existential and extended pattern guards, all the server
code is in Haskell98.

I have been using LO objects of PostgreSQL. That is not the only
design choice: an alternative is to create a table where each row
holds a chunk (e.g., 16K) of data as a byte array. The row will have
two more columns: the blob id and the chunk ordinal counter. This
design lets one incrementally write binary data (using COPYIN
technique) and read data (using COPYOUT or the regular SQL
queries). The COPYIN feature lets us write to the database in
user-defined chunks. Alas, the converse, COPYOUT, can read only whole
rows, which precludes storing all of the data in one row. If we
segment the data in chunks spread across several rows, we regain
incrementality. I have a hunch this method may be preferable, although
I have not tried it. The drawback of LO objects is the need for
frequent vacuuming, which may take really a while if many large
objects are being created and deleted.

The blob interface is designed to permit incremental reading and
writing blobs. In fact, the server never stores the whole blob in
memory.

Enclosed are the implementations of store_lo and consume_lo
functions. They rely on the notions of `generalized' input and output
ports and the generalized copier. I've been meaning to describe them
properly but don't seem to get around to it. I could refer to the
comments in the file
        http://okmij.org/ftp/Haskell/NewerCGI.hs
Frequently mentioned EMonadIO is a class of monad that permit both i/o
operations and throwing _and_ catching of arbitrary errors. Most of
the transformations of IO are in that class. EMonadIO lets me write
gthrow, gcatch, ghandle, gbracket, etc. without even thinking in which
monad I currently am.



-- Read data from a LO (a kind of PostgreSQL blobs). A blob is identified
-- by its Oid. We determine the size of the blob, create a generalized
-- input port for reading from the blob, and pass the size and the
-- port to the user function. The function will probably use BCopy
-- to copy data from the blob to somewhere else. The function should
-- not store the generalized input port anywhere as the port can't be
-- used after the function returns. We could have enforced that with a
-- marker and Typeable, as we do in Takusen.
-- This function must be invoked in a transaction (it would cause a
-- database error otherwise).
-- We don't bracket the call to the user function as any exceptions
-- are fatal anyway.
consume_lo :: EMonadIO m => Connection -> Oid ->
              (Int -> Input -> m a) -> m a
consume_lo (Connection db) oid f =
   do
    lofd <- liftIO $ check_pos "lo_open" $ flo_open db oid eINV_READ
          -- get the size of LO by seeking to the end, and coming back
    size <- liftIO $ check_pos "lo_lseek" $ flo_lseek db lofd 0 eSEEK_END
    liftIO $ check_pos "lo_lseek" $ flo_lseek db lofd 0 eSEEK_SET
    let inp = Input (\ptr len ->
         liftIO . liftM fromIntegral . check_pos "lo_read" $
                     flo_read db lofd ptr (fromIntegral len))
    r <- f (fromIntegral size) inp
    liftIO $ check_pos "lo_close" $ flo_close db lofd
    return r
  `gcatch` \e -> print_exc e >> liftIO (closeDb db) >> gthrow e
 where
 check_pos str a = a >>= \r -> if r >= 0 then return r else throwPG r str


-- Write data to a LO (a kind of PostgreSQL blobs). A blob is identified
-- by its Oid. We create a generalized output port for writing to the blob,
-- and pass it to the user function. The function will probably use BCopy
-- to copy data to the blob from somewhere else. The function should
-- not store the generalized output port anywhere as the port can't be
-- used after the function returns. We could have enforced that with a
-- marker and Typeable, as we do in Takusen.
-- This function must be invoked in a transaction (it would cause a
-- database error otherwise).
-- We don't bracket the call to the user function as any exceptions
-- are fatal anyway.
store_lo :: EMonadIO m => Connection -> Oid -> (Output -> m a) -> m a
store_lo (Connection db) oid f =
   do
    lofd <- liftIO $ check_pos "lo_open" $ flo_open db oid eINV_WRITE
    let outp = Output (\ptr len ->
         liftIO (check_size len =<< check_pos "lo_write"
                     (flo_write db lofd ptr (fromIntegral len))))
    r <- f outp
    liftIO $ check_pos "lo_close" $ flo_close db lofd
    return r
  `gcatch` \e -> print_exc e >> liftIO (closeDb db) >> gthrow e
 where
 check_pos str a = a >>= \r -> if r >= 0 then return r else throwPG r str
 check_size len lwritten =
     if len == fromIntegral lwritten then return ()
        else throwPG (-1) "lo_write wrote less than expected"

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe