[Live-devel] RTCP socket blocking
Marc Neuberger
mjn at oxysys.com
Fri Jun 1 15:00:48 PDT 2007
I am having an extremely occasional hang of a live555-based linux rtsp
server under heavy load. I have induced a core dump to see where the
hang occurs. It seems that we hang waiting for a packet on the RTCP
socket. The RTCP socket does not appear to be set to non-blocking. Now,
at first glance, it would appear that, even though it is a blocking
socket, this should never happen, since we've had select() (or in my
case, epoll()) report data available. However, it turns out that the
linux kernel feels free to drop UDP packets after notifying a socket
that it is readable. From the select() man page:
Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which
a file descriptor is spuriously reported as ready. Thus it may be
safer to use O_NONBLOCK on sockets that should not block.
As I thought about the complete hang I was seeing, I became suspicious
of my theory, since presumably, the read would return on the next RTCP
packet. So I instrumented my scheduler with timings of the callbacks for
turnOnBackgroundReading. I counted callbacks that take >10ms >100ms and
>1000ms. I find that I do see quite a few instances of a background
read task taking over 1 second. This leads me to believe that the read
of an RTCP packet has to wait for the _next_ RTCP packet from time to
time. My server hangs on the exceptionally rare instance that this lost
RTCP packet is the _last_ RTCP packet coming from the client.
Is there a built-in assumption that the RTCP socket is blocking? If I
just change the code to make it non-blocking, will there be any ill
effect on the session when such an RTCP packet is lost?
Marc Neuberger
More information about the live-devel
mailing list