network package and SIGVTALRM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

network package and SIGVTALRM

Ruben Astudillo
Hi all

I am doing a DCC subsystem on a irc client. After all the handshakes are
done I just connect to the server and start `recv`. The code I use for
this is:

     getPackets :: MVar Int
                -> FilePath -- ^ Name media
                -> Int      -- ^ File size
                -> AddrInfo -> ExceptT DCCError IO ()
     getPackets mvar name totalSize addr =
       do receivedSize <- lift $ bracket acquire release receive
          let delta = (totalSize - receivedSize)
          if delta > 0 then throwE (NotFullRecv delta) else return ()
       where
         bufferSize = 16384

         acquire :: IO (IO.Handle,Socket)
         acquire = (,) <$> (IO.openFile name IO.WriteMode)
                       <*> newSocket addr

         release :: (IO.Handle,Socket) -> IO ()
         release (hdl, sock) = IO.hClose hdl >> close sock

         receive :: (IO.Handle,Socket) -> IO Int
         receive (hdl, sock) =
             flip execStateT 0 . fix $ \loop -> do
                 mediaData <- lift (B.recv sock bufferSize)
                 unless (B.null mediaData) $ do
                     S.modify' (+ (B.length mediaData))
                     currentSize <- S.get
                     lift $ B.hPut hdl mediaData
                            >> B.send sock (int2BS currentSize)
                            >> swapMVar mvar currentSize
                     loop

Part of the protocol is that on each `recv` I send the current received
size on network byte order. Hence the B.send line on receive I use this
function:

     -- | given a number forms a bytestring with each digit on a separated
     -- Word8 in network byte-order
     int2BS :: Int -> B.ByteString
     int2BS i | w <- (fromIntegral i :: Word32) =
         B.pack [ (fromIntegral (shiftR w 24) :: Word8)
               , (fromIntegral (shiftR w 16) :: Word8)
               , (fromIntegral (shiftR w  8) :: Word8)
               , (fromIntegral w             :: Word8)]

Everything works correctly until around 1/2 of a test transfer (ie in a
file of 340M it gets 170). That first half is gotten in the right order
(I tested with a video and it was playable until the middle). On tinier
files the bug doesn't happen, the file is received completly. I did a
little bit of `strace` and `tcpdump` and I got this

-- strace -e trace=network -p $client
     (..)
     30439 recvfrom(13,
"\312\255\201\337\376\355\253\r\177\276\204X]8\6\221\301#\361<>\273+\355\5\343B
\333\366\351W"..., 16384, 0, NULL, NULL) = 1380
     30439 sendto(13, "\n\273\31l", 4, 0, NULL, 0) = 4
     30439 recvfrom(13, 0x20023f010, 16384, 0, NULL, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
     30439 recvfrom(13,
"\222llq_H\23\17\275\f}\367\"P4\23\207\312$w\371J\354aW2\243R\32\v\n\251"...,
16384, 0, NULL, NULL) = 1380
     30439 sendto(13, "\n\273\36\320", 4, 0, NULL, 0) = -1 EAGAIN (Resource
temporarily unavailable)
     30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER,
si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
     30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER,
si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
     (..)

-- tcpdump
     05:20:22.788273 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48004680, win 489, options [nop,nop,TS val 627805332 ecr
675358947,nop,nop,sack 1 {48006060:48066780}], length 0
     05:20:22.975627 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], seq
48004680:48006060, ack 82629, win 0, options [nop,nop,TS val 675359033 ecr
627805248], length 1380
     05:20:23.014991 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48066780, win 4, options [nop,nop,TS val 627805559 ecr 675359033], length 0
     05:20:23.768012 IP 198.255.92.74.36103 > tapioca.36346: Flags [P.],
seq 48066780:48067292, ack 82629, win 0, options [nop,nop,TS val 675359232
ecr 627805559], length 512
     05:20:23.768143 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48067292, win 0, options [nop,nop,TS val 627806312 ecr 675359232], length 0
     05:20:24.523397 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], ack
82629, win 0, options [nop,nop,TS val 675359421 ecr 627806312], length 0

What bothers me is that SIGVTALRM on the strace output. I am not the
greatest unix hacker but that signal is related to settimer and I haven't
explicitly set that up. So I am scratching me head a little. Maybe
somebody has experienced something related with the network package? Do
you notice something on the logs? thanks in advance.

--
-- Ruben Astudillo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Andrey Sverdlichenko
Hi Ruben,

This signal is used internally by Haskell IO manager, nothing is wrong with it.

What I see in the dump is that 198.255.92.74 is announcing window of size 0 from the beginning and tapioca ends up sending 0 window too. Are you sure they both reading what other one sends to them? Given that your code only fails on large files, I guess sender never reads this 4-byte status messages receiver sends to it, and processes deadlock after socket buffer is filled up.

As a side node, consider using putWord32be from binary or cereal packages instead of serializing data yourself.
On Tue, 12 Jul 2016 at 03:35, Ruben Astudillo <[hidden email]> wrote:
Hi all

I am doing a DCC subsystem on a irc client. After all the handshakes are
done I just connect to the server and start `recv`. The code I use for
this is:

     getPackets :: MVar Int
                -> FilePath -- ^ Name media
                -> Int      -- ^ File size
                -> AddrInfo -> ExceptT DCCError IO ()
     getPackets mvar name totalSize addr =
       do receivedSize <- lift $ bracket acquire release receive
          let delta = (totalSize - receivedSize)
          if delta > 0 then throwE (NotFullRecv delta) else return ()
       where
         bufferSize = 16384

         acquire :: IO (IO.Handle,Socket)
         acquire = (,) <$> (IO.openFile name IO.WriteMode)
                       <*> newSocket addr

         release :: (IO.Handle,Socket) -> IO ()
         release (hdl, sock) = IO.hClose hdl >> close sock

         receive :: (IO.Handle,Socket) -> IO Int
         receive (hdl, sock) =
             flip execStateT 0 . fix $ \loop -> do
                 mediaData <- lift (B.recv sock bufferSize)
                 unless (B.null mediaData) $ do
                     S.modify' (+ (B.length mediaData))
                     currentSize <- S.get
                     lift $ B.hPut hdl mediaData
                            >> B.send sock (int2BS currentSize)
                            >> swapMVar mvar currentSize
                     loop

Part of the protocol is that on each `recv` I send the current received
size on network byte order. Hence the B.send line on receive I use this
function:

     -- | given a number forms a bytestring with each digit on a separated
     -- Word8 in network byte-order
     int2BS :: Int -> B.ByteString
     int2BS i | w <- (fromIntegral i :: Word32) =
         B.pack [ (fromIntegral (shiftR w 24) :: Word8)
               , (fromIntegral (shiftR w 16) :: Word8)
               , (fromIntegral (shiftR w  8) :: Word8)
               , (fromIntegral w             :: Word8)]

Everything works correctly until around 1/2 of a test transfer (ie in a
file of 340M it gets 170). That first half is gotten in the right order
(I tested with a video and it was playable until the middle). On tinier
files the bug doesn't happen, the file is received completly. I did a
little bit of `strace` and `tcpdump` and I got this

-- strace -e trace=network -p $client
     (..)
     30439 recvfrom(13,
"\312\255\201\337\376\355\253\r\177\276\204X]8\6\221\301#\361<>\273+\355\5\343B
\333\366\351W"..., 16384, 0, NULL, NULL) = 1380
     30439 sendto(13, "\n\273\31l", 4, 0, NULL, 0) = 4
     30439 recvfrom(13, 0x20023f010, 16384, 0, NULL, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
     30439 recvfrom(13,
"\222llq_H\23\17\275\f}\367\"P4\23\207\312$w\371J\354aW2\243R\32\v\n\251"...,
16384, 0, NULL, NULL) = 1380
     30439 sendto(13, "\n\273\36\320", 4, 0, NULL, 0) = -1 EAGAIN (Resource
temporarily unavailable)
     30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER,
si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
     30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER,
si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
     (..)

-- tcpdump
     05:20:22.788273 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48004680, win 489, options [nop,nop,TS val 627805332 ecr
675358947,nop,nop,sack 1 {48006060:48066780}], length 0
     05:20:22.975627 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], seq
48004680:48006060, ack 82629, win 0, options [nop,nop,TS val 675359033 ecr
627805248], length 1380
     05:20:23.014991 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48066780, win 4, options [nop,nop,TS val 627805559 ecr 675359033], length 0
     05:20:23.768012 IP 198.255.92.74.36103 > tapioca.36346: Flags [P.],
seq 48066780:48067292, ack 82629, win 0, options [nop,nop,TS val 675359232
ecr 627805559], length 512
     05:20:23.768143 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack
48067292, win 0, options [nop,nop,TS val 627806312 ecr 675359232], length 0
     05:20:24.523397 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], ack
82629, win 0, options [nop,nop,TS val 675359421 ecr 627806312], length 0

What bothers me is that SIGVTALRM on the strace output. I am not the
greatest unix hacker but that signal is related to settimer and I haven't
explicitly set that up. So I am scratching me head a little. Maybe
somebody has experienced something related with the network package? Do
you notice something on the logs? thanks in advance.

--
-- Ruben Astudillo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Ruben Astudillo
On 12/07/16 11:40, Andrey Sverdlichenko wrote:
> they both reading what other one sends to them? Given that your code only
> fails on large files, I guess sender never reads this 4-byte status
> messages receiver sends to it, and processes deadlock after socket buffer
> is filled up.

Right on the nail. I played a bit with tcpdump/wireshark to see I was
just sending 0 length ACKs and multiple 4 byte messages joined on a
single packet. Seems TCP buffers tiny packets using a strategy called
Nagle's algorithm and thus joined all my packets on a bufferzone until a
threeshold. Then it sent them all at once, making the other end crazy.
Adding on a function that set-ups the socket this line

     newSocket :: AddrInfo -> IO Socket
     newSocket addr = do
         sock <- socket AF_INET Stream defaultProtocol
         setSocketOption sock NoDelay 1   -- this was added
         connect sock (addrAddress addr)
         return sock

makes the downloads go until the end. :-)


> As a side node, consider using putWord32be from binary or cereal packages
> instead of serializing data yourself.

Note taken. I just don't want to impose on a new dependency on my first
patch. I did copy/pasted putWord32be (at least on style) for my int2BS
function though.

Thanks a lot!
-- Ruben Astudillo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Andrey Sverdlichenko
On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo <[hidden email]> wrote:

Right on the nail. I played a bit with tcpdump/wireshark to see I was
just sending 0 length ACKs and multiple 4 byte messages joined on a
single packet. Seems TCP buffers tiny packets using a strategy called
Nagle's algorithm and thus joined all my packets on a bufferzone until a
threeshold. Then it sent them all at once, making the other end crazy.

This looks a bit scary. It should not matter if replies are merged or not.
By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks.

Regards,
  Andrey

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Baojun Wang
Suppose TCP_NODELAY sock opt can prevent Nagle join small packets?

On Wed, Jul 13, 2016 at 11:10 AM Andrey Sverdlichenko <[hidden email]> wrote:
On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo <[hidden email]> wrote:

Right on the nail. I played a bit with tcpdump/wireshark to see I was
just sending 0 length ACKs and multiple 4 byte messages joined on a
single packet. Seems TCP buffers tiny packets using a strategy called
Nagle's algorithm and thus joined all my packets on a bufferzone until a
threeshold. Then it sent them all at once, making the other end crazy.

This looks a bit scary. It should not matter if replies are merged or not.
By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks.

Regards,
  Andrey
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Andrey Sverdlichenko
If you are lucky. They still may be merged by sender if retransmission occurs, or on receiving side, if receiver waits too long before reads, and this is up to OS scheduler to control.
NDELAY option is used to improve interactive latency, it will not make TCP obey message boundaries.

On Wed, Jul 13, 2016 at 11:43 AM Baojun Wang <[hidden email]> wrote:
Suppose TCP_NODELAY sock opt can prevent Nagle join small packets?

On Wed, Jul 13, 2016 at 11:10 AM Andrey Sverdlichenko <[hidden email]> wrote:
On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo <[hidden email]> wrote:

Right on the nail. I played a bit with tcpdump/wireshark to see I was
just sending 0 length ACKs and multiple 4 byte messages joined on a
single packet. Seems TCP buffers tiny packets using a strategy called
Nagle's algorithm and thus joined all my packets on a bufferzone until a
threeshold. Then it sent them all at once, making the other end crazy.

This looks a bit scary. It should not matter if replies are merged or not.
By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks.

Regards,
  Andrey
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Ruben Astudillo
On 13/07/16 14:54, Andrey Sverdlichenko wrote:
> If you are lucky. They still may be merged by sender if retransmission
> occurs, or on receiving side, if receiver waits too long before reads, and
> this is up to OS scheduler to control.
> NDELAY option is used to improve interactive latency, it will not make TCP
> obey message boundaries.

You're right. But understanding a little better the problem maybe will
clarify why NODELAY is a valid option. DCC is in parts a redundant
protocol. When you connect, the senders gives you the file you want but
for coherency reasons every once in a while you have to reply the current
transfered size through the same socket. This was a mean of preserving
consistency that is redundant by the same mechanisms implemented on TCP.
 From the page[1] I am using to implement

   ``client A sends blocks of data (usually 1-2 KB) and at every block awaits
   confirmation from the client B, that when receiving a block should reply
   4 bytes containing an positive number specifying the total size of the
   file received up to that moment.

   The transmission closes when the last acknowledge is received by client A.

   The acknowledges were meant to include some sort of coherency check in
   the transmission, but in fact no client can recover from an acknowledge
   error/desync, all of them just close the connection declaring the
   transfer as failed (the situation is even worse in fact, often
   acknowledge errors aren't even detected!).

   Since the packet-acknowledge round trip eats a lot of time, many clients
   included the send-ahead feature; the client A does not wait for the
   acknowledge of the first packet before sending the second one.''

The last part explains why my download still succeded until half the size
of the file. But no sending any reply (because the message is too little
to send) is a failure of interactivity on the protocol, not of message
boundaries (which specify that the reply is 4 byte in length).

[1]: http://www.kvirc.net/doc/doc_dcc_connection.html
--
-- Ruben Astudillo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

David Turner-2
Hi Ruben,

I think you're falling into a common trap re. TCP. On a lightly-loaded network, if you send a block of data on one host it typically arrives at the other end of the connection as one thing. In other words, calls to send() and recv() are one-to-one. In that situation adding NODELAY will (seem to) solve problems like the ones that you were seeing. However, it will all fall to pieces when you're running under load or there's congestion or some other kind of problem, as it's perfectly legitimate for packets to be combined and/or fragmented which breaks this one-to-one relationship on which the correctness of your program rests.

You _must_ treat data received over TCP as a continuous stream of bytes and not a sequence of discrete packets, and do things such as accounting for the case where your 4-byte length indicator is split across two packets so does not all arrive at once. If you don't, it will bite you at the very worst time, and will do so nondeterministically. This kind of thing is very hard to reproduce in a test environment.

There is nothing special about the DCC protocol that makes it immune from this effect.

Best wishes,

David





On 14 July 2016 at 10:48, Ruben Astudillo <[hidden email]> wrote:
On 13/07/16 14:54, Andrey Sverdlichenko wrote:
If you are lucky. They still may be merged by sender if retransmission
occurs, or on receiving side, if receiver waits too long before reads, and
this is up to OS scheduler to control.
NDELAY option is used to improve interactive latency, it will not make TCP
obey message boundaries.

You're right. But understanding a little better the problem maybe will
clarify why NODELAY is a valid option. DCC is in parts a redundant
protocol. When you connect, the senders gives you the file you want but
for coherency reasons every once in a while you have to reply the current
transfered size through the same socket. This was a mean of preserving
consistency that is redundant by the same mechanisms implemented on TCP.
From the page[1] I am using to implement

  ``client A sends blocks of data (usually 1-2 KB) and at every block awaits
  confirmation from the client B, that when receiving a block should reply
  4 bytes containing an positive number specifying the total size of the
  file received up to that moment.

  The transmission closes when the last acknowledge is received by client A.

  The acknowledges were meant to include some sort of coherency check in
  the transmission, but in fact no client can recover from an acknowledge
  error/desync, all of them just close the connection declaring the
  transfer as failed (the situation is even worse in fact, often
  acknowledge errors aren't even detected!).

  Since the packet-acknowledge round trip eats a lot of time, many clients
  included the send-ahead feature; the client A does not wait for the
  acknowledge of the first packet before sending the second one.''

The last part explains why my download still succeded until half the size
of the file. But no sending any reply (because the message is too little
to send) is a failure of interactivity on the protocol, not of message
boundaries (which specify that the reply is 4 byte in length).

[1]: http://www.kvirc.net/doc/doc_dcc_connection.html
--
-- Ruben Astudillo

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: network package and SIGVTALRM

Andrey Sverdlichenko
In reply to this post by Ruben Astudillo
The last part explains why my download still succeded until half the size
of the file. But no sending any reply (because the message is too little
to send) is a failure of interactivity on the protocol, not of message
boundaries (which specify that the reply is 4 byte in length).

TCP with Nagle algorithm enabled will not hold data infinitely. In fact, it only waits for a few tenths of a second, hoping there would be more data to send. If not, whatever it has is sent away.
What your dumps show is a window announcement of size 0, which means there was a lot of data successfully received by TCP stack, but never read from socket, and this happens in both directions. You may want to check why you processes stop issuing read/recv calls.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.