[Live-devel] RTSPOverTCP race condition (temporary lockout)?

Mon Mar 22 17:11:40 PDT 2010

Hi Ross,

I thought I should open a new thread for the RTSPOverTCP lockout issue
that I've come across (from my response to Jeremy's message (subject:
Detecting network failure)).

I apologize for the detail here...if you have any
suggestions/preferences for further communications, please advise.

In case it helps, I've reproduced the CPU spike with RTPoverRTSP (as
well as RTSPOverHTTP)...so, I'm fairly convinced it wasn't anything I've
added for RTSPOverHTTP.

My test case is to simply open a VLC RTPoverRTSP client session, and
immediately close it...

The only reason I added the return of the send() status was to allow
recovery from (avoidance of) whatever cpu bound processing loop which
may have been causing the cpu utilization spike (lockout).

Anyway, I was pursuing the thought that it might have had to do with the
special handling for RTPOverTCP streams in sendRTPOverTCP and/or the
tcpReadHandler() upon connection reset by peer.

A side question/concern, is that readSocket (and readSocketExact) always
use recvfrom, even though, in this case, we are reading from stream
sockets...therefore, the CONNECTION state will never be considered by
the socket layer. If we used recv() we might get a "connection reset by
peer" indication....no?  The problem I had when I tried that theory out,
is that select() always returns timeout so we never get to the recv() or
recvfrom() calls...

The lockup doesn't occur every time...

When it doesn't lockup, readSocket() returns with result = -1, and errno
indicates the 104 (Connection reset by peer)...and therefore the
background readhandler gets turned off..

When it does lockup, I do NOT get that same indication...

so, I'm suspicious that temporary lockout is a result of some race
condition in the timing of receiving the (VLC) peer TCP disconnect and
where we are in the processing of the "interleaved message(s)"...

VLC sends the TEARDOWN (which we don't "see" because once we start the
tcpReadHandler we are then *only* looking for RTCP messages, NOT any
further RTSP messages like TEARDOWN or SET_PARAMETER (e.g. for
keepalive...true??), sometimes (why only sometimes?) followed by the
RTCP BYE, then it immediately (~100us later) closes the TCP connection.

I think there may be a race there that causes some grief in the
tcpReadHandler() processing...where, depending on where we're at in our
interleaved packet handling when the TCP connection gets reset by the
peer, we end up in a non-blocking, and non-exiting near infinite loop...

Regarding the TEARDOWN, it would seem we might need a more stateful
implementation of the interleaved stream parsing, no? Or, is it by
design that we no longer process the TEARDOWN, because we SHOULD see the
RTCP BYE, or will otherwise timeout the connection??

I think the problem case is where we are waiting for the next
'$'...where we call readSocket() with a non-NULL timeout (all other
calls to readsocket() (for the remaining portions of the exchange) use a
NULL timeout so the select call will block...

FWIW, In my testing, disconnecting the network doesn't cause the CPU
spike...like the quick STOP of VLC did when I was testing
RTSPOverHTTP...

As I mentioned in my response to Jeremy, the client session will
eventually be brought down via liveness check "due to
inactivity"...however, my linux system gets pretty locked up until that
expiry triggers...

I appreciate any light you may be able to shine on this one.

Randy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.live555.com/pipermail/live-devel/attachments/20100322/40ff3647/attachment-0001.html>