Chunked upload/download from AWS S3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Sal
Reply | Threaded
Open this post in threaded view
|

Chunked upload/download from AWS S3

Sal
Hello,

I am planning to use pipes-http for AWS S3 put/get operations (involving big binary objects). I noticed that the pipes-http `stream` api mentions that the server must support chunked encoding. So, I looked up AWS documentation which mentions that they have a different way of doing chunking (basically, adding signature to every chunk). 

 I also checked `aws` and `amazonka-s3` packages  - it seems to me that they are not compatible with pipes-http because they use conduit. Please correct me if I got this wrong. So, it seem to me I must write my own HTTP request/response using `pipes` for AWS S3 operations, and must write custom chunking.

If any one has already done this before, and could share tips, that will be very helpful.

Thanks.

--
Sal
Reply | Threaded
Open this post in threaded view
|

Re: Chunked upload/download from AWS S3

Sal
Not sure why Ben's post isn't visible in this group yet though it was sent to the mailing list - here is what he wrote:

-----------
Have a look at my recently-uploaded pipes-s3 package [1].

Cheers,

- Ben

[1] https://hackage.haskell.org/package/pipes-s3
-----------

This looks very useful. One question though - shouldn't HTTP manager be created only once, instead of being recreated for every request in `fromS3'` request wrapper? Here is my code involving AWS.S3 with conduit - should we take a similar approach but with pipes-s3 apis?

{-# LANGUAGE OverloadedStrings #-}

import qualified Aws
import qualified Aws.Core as Aws
import qualified Aws.S3 as S3
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sourceFile)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,requestBodySource,newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
import Control.Monad.IO.Class
import System.IO
import Control.Monad.Trans.Resource (runResourceT)
import Control.Concurrent.Async (async,waitCatch)
import Control.Exception (displayException)
import Data.Text as T (pack)
import Data.List (lookup)

main
:: IO ()

main
= do
 
{- Set up AWS credentials and S3 configuration using the IA endpoint. -}
 
Just creds <- Aws.loadCredentialsFromEnv
  let cfg
= Aws.Configuration Aws.Timestamp creds (Aws.defaultLog Aws.Error)
  let s3cfg
= S3.s3 Aws.HTTP S3.s3EndpointUsClassic False

 
{- Set up a ResourceT region with an available HTTP manager. -}

  httpmgr
<- newManager tlsManagerSettings
  let file
="out" -- can create a 100MB test file like this on linux: dd if=/dev/urandom of=out bs=100M count=1 iflag=fullblock
  let inbytes
= sourceFile file
  lenb
<- System.IO.withFile file ReadMode hFileSize
  req
<- async $ runResourceT $ do
   
Aws.pureAws cfg s3cfg httpmgr $
     
(S3.putObject "put-your-test-bucket-here" ("testbucket/test") (requestBodySource (fromIntegral lenb) inbytes))
       
{  
          S3
.poMetadata = [("content-type","text;charset=UTF-8"),("content-length",T.pack $ show lenb)]
       
-- Automatically creates bucket on IA if it does not exist,
       
-- and uses the above metadata as the bucket's metadata.
          ,S3.poAutoMakeBucket = True
        }
  reqRes <- waitCatch req
  case reqRes of
    Left e -> print $ displayException $ e
    Right r -> print $ S3.porVersionId r




On Monday, May 30, 2016 at 10:49:21 AM UTC-4, Sal wrote:
Hello,

I am planning to use pipes-http for AWS S3 put/get operations (involving big binary objects). I noticed that the pipes-http `stream` api mentions that the server must support chunked encoding. So, I looked up <a href="http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.aws.amazon.com%2FAmazonS3%2Flatest%2FAPI%2Fsigv4-streaming.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH0NEFYqgGqVilTqIg5RHkW9D6hfg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.aws.amazon.com%2FAmazonS3%2Flatest%2FAPI%2Fsigv4-streaming.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH0NEFYqgGqVilTqIg5RHkW9D6hfg&#39;;return true;">AWS documentation which mentions that they have a different way of doing chunking (basically, adding signature to every chunk). 

 I also checked `aws` and `amazonka-s3` packages  - it seems to me that they are not compatible with pipes-http because they use conduit. Please correct me if I got this wrong. So, it seem to me I must write my own HTTP request/response using `pipes` for AWS S3 operations, and must write custom chunking.

If any one has already done this before, and could share tips, that will be very helpful.

Thanks.

--
Reply | Threaded
Open this post in threaded view
|

Re: Chunked upload/download from AWS S3

Ben Gamari-4
Sal <[hidden email]> writes:

> Not sure why Ben's post isn't visible in this group yet though it was sent
> to the mailing list - here is what he wrote:
>
Ahh, indeed it looks like I sent it from the wrong email address.

> -----------
> Have a look at my recently-uploaded pipes-s3 package [1].
>
> Cheers,
>
> - Ben
>
> [1] https://hackage.haskell.org/package/pipes-s3
> -----------
>
> This looks very useful. One question though - shouldn't HTTP manager be
> created only once, instead of being recreated for every request in
> `fromS3'` request wrapper?
>
Hmmm, perhaps, although in my previous use-cases the objects being
read/written were rather large so the cost of bringing up a new HTTP
manager is relatively quite small.

If it would help I could expose another variant of the interface
allowing one to provide a Manager to use. For instance,

    fromS3' :: MonadSafe m
            => Manager -> Aws.Configuration -> Bucket -> Object
            -> (Response (Producer BS.ByteString m ()) -> Producer BS.ByteString m a)
            -> Producer BS.ByteString m a

My only hesistation in doing so is that this is a rather type-unsafe
interface since AWS requires TLS yet there is nothing in the type to
suggest this.

Cheers,

- Ben

--



Sal
Reply | Threaded
Open this post in threaded view
|

Re: Chunked upload/download from AWS S3

Sal


If it would help I could expose another variant of the interface
allowing one to provide a Manager to use. For instance,

    fromS3' :: MonadSafe m
            => Manager -> Aws.Configuration -> Bucket -> Object
            -> (Response (Producer BS.ByteString m ()) -> Producer BS.ByteString m a)
            -> Producer BS.ByteString m a

My only hesistation in doing so is that this is a rather type-unsafe
interface since AWS requires TLS yet there is nothing in the type to
suggest this.

Ben, understood. Perhaps TLS requirement warning can be added to documentation for the API. That way, we have a long-lived HTTP manager, instead of creating a new one every time, especially for short requests.

--
Reply | Threaded
Open this post in threaded view
|

Re: Chunked upload/download from AWS S3

Ben Gamari-4
Sal <[hidden email]> writes:

>>
>> If it would help I could expose another variant of the interface
>> allowing one to provide a Manager to use. For instance,
>>
>>     fromS3' :: MonadSafe m
>>             => Manager -> Aws.Configuration -> Bucket -> Object
>>             -> (Response (Producer BS.ByteString m ()) -> Producer
>> BS.ByteString m a)
>>             -> Producer BS.ByteString m a
>>
>> My only hesistation in doing so is that this is a rather type-unsafe
>> interface since AWS requires TLS yet there is nothing in the type to
>> suggest this.
>>
>
> Ben, understood. Perhaps TLS requirement warning can be added to
> documentation for the API. That way, we have a long-lived HTTP manager,
> instead of creating a new one every time, especially for short requests.
>
Indeed, this sounds reasonable. How does this [1] look?

Cheers,

- Ben


[1] https://github.com/bgamari/pipes-s3/commit/598cb0ea1c43b8a11f423e849af047756296c723

--



Sal
Reply | Threaded
Open this post in threaded view
|

Re: Chunked upload/download from AWS S3

Sal
Looks good from what I eyeballed. I am going to try it out. I might also try to adapt streaming package by re-using your code for AWS request signing. streaming looks like very clean API - so, I am checking it out as well.

On Monday, June 6, 2016 at 5:01:31 AM UTC-4, Ben Gamari wrote:
Sal <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FIXOmecrBQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">sanket....@...> writes:

>>
>> If it would help I could expose another variant of the interface
>> allowing one to provide a Manager to use. For instance,
>>
>>     fromS3' :: MonadSafe m
>>             => Manager -> Aws.Configuration -> Bucket -> Object
>>             -> (Response (Producer BS.ByteString m ()) -> Producer
>> BS.ByteString m a)
>>             -> Producer BS.ByteString m a
>>
>> My only hesistation in doing so is that this is a rather type-unsafe
>> interface since AWS requires TLS yet there is nothing in the type to
>> suggest this.
>>
>
> Ben, understood. Perhaps TLS requirement warning can be added to
> documentation for the API. That way, we have a long-lived HTTP manager,
> instead of creating a new one every time, especially for short requests.
>
Indeed, this sounds reasonable. How does this [1] look?

Cheers,

- Ben


[1] <a href="https://github.com/bgamari/pipes-s3/commit/598cb0ea1c43b8a11f423e849af047756296c723" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fbgamari%2Fpipes-s3%2Fcommit%2F598cb0ea1c43b8a11f423e849af047756296c723\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFVbBvmlSSQlTCciExmxYBo5yRCjQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fbgamari%2Fpipes-s3%2Fcommit%2F598cb0ea1c43b8a11f423e849af047756296c723\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFVbBvmlSSQlTCciExmxYBo5yRCjQ&#39;;return true;">https://github.com/bgamari/pipes-s3/commit/598cb0ea1c43b8a11f423e849af047756296c723

--