[Live-devel] RTCP socket blocking

Fri Jun 1 15:00:48 PDT 2007

I am having an extremely occasional hang of a live555-based linux rtsp 
server under heavy load. I have induced a core dump to see where the 
hang occurs. It seems that we hang waiting for a packet on the RTCP 
socket. The RTCP socket does not appear to be set to non-blocking. Now, 
at first glance, it would appear that, even though it is a blocking 
socket, this should never happen, since we've had select() (or in my 
case, epoll()) report data available. However, it turns out that the 
linux kernel feels free to drop UDP packets after notifying a socket 
that it is readable. From the select() man page:

Under  Linux,  select()  may report a socket file descriptor as "ready 
for reading", while nevertheless a subsequent read blocks. This could 
for example happen when data has arrived but upon examination has wrong 
checksum and is discarded. There may be other circumstances in  which  
a  file  descriptor  is  spuriously reported as ready.  Thus it may be 
safer to use O_NONBLOCK on sockets that should not block.

As I thought about the complete hang I was seeing, I became suspicious 
of my theory, since presumably, the read would return on the next RTCP 
packet. So I instrumented my scheduler with timings of the callbacks for 
turnOnBackgroundReading. I counted callbacks that take >10ms >100ms and 
 >1000ms. I find that I do see quite a few instances of a background 
read task taking over 1 second. This leads me to believe that the read 
of an RTCP packet has to wait for the _next_ RTCP packet from time to 
time. My server hangs on the exceptionally rare instance that this lost 
RTCP packet is the _last_ RTCP packet coming from the client.

Is there a built-in assumption that the RTCP socket is blocking? If I 
just change the code to make it non-blocking, will there be any ill 
effect on the session when such an RTCP packet is lost?

Marc Neuberger