<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}
p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>Hi all,<br></div><div><br></div><div>I believe I found a server bug. If a client connects from unreliable network, server may lock up. In that state, it does not accept new connections, it cannot be stopped, and if connection improves, it does not become responsive again. The problem is confirmed on Raspbian and macOS so far.<br></div><div><br></div><div>I debugged it and found the root cause. The code below makes a socket blocking temporarily while writing into it. Then it reverts it to non-blocking only if operation succeeds, leaving it blocking if it fails:<br></div><div><br></div><div>liveMedia/RTPInterface.cpp:398:<br></div><div>```<br></div><div> makeSocketBlocking(socketNum, RTPINTERFACE_BLOCKING_WRITE_TIMEOUT_MS);<br></div><div> ....<br></div><div>? tlsState->write((char const*)(&data[numBytesSentSoFar]), numBytesRemainingToSend)<br></div><div>: send(socketNum, (char const*)(&data[numBytesSentSoFar]), numBytesRemainingToSend, 0/*flags*/);<br></div><div> if ((unsigned)sendResult != numBytesRemainingToSend) {<br></div><div> ....<br></div><div>removeStreamSocket(socketNum, 0xFF);<br></div><div>return False;<br></div><div> }<br></div><div> makeSocketNonBlocking(socketNum);<br></div><div> return True;<br></div><div> }<br></div><div>```<br></div><div><br></div><div>What happens next is when it attempts to read from a blocking socket in the loop of 2000 iterations below, it locks up forever (2000 x timeout is a long time):<br></div><div><br></div><div>liveMedia/RTPInterface.cpp:510:<br></div><div>```<br></div><div>void SocketDescriptor::tcpReadHandler(SocketDescriptor* socketDescriptor, int mask) {<br></div><div> // Call the read handler until it returns false, with a limit to avoid starving other sockets<br></div><div> unsigned count = 2000;<br></div><div> socketDescriptor->fAreInReadHandlerLoop = True;<br></div><div> while (!socketDescriptor->fDeleteMyselfNext && socketDescriptor->tcpReadHandler1(mask) && --count > 0) {}<br></div><div> socketDescriptor->fAreInReadHandlerLoop = False;<br></div><div> if (socketDescriptor->fDeleteMyselfNext) delete socketDescriptor;<br></div><div>}<br></div><div>```<br></div><div><br></div><div>I was able to mitigate it using the patch:<br></div><div><br></div><div>```<br></div><div>--- live/liveMedia/RTPInterface.cpp 2021-12-07 21:33:13.000000000 +0000<br></div><div>+++ live555/liveMedia/RTPInterface.cpp 2021-12-14 07:07:19.257748176 +0000<br></div><div>@@ -399,6 +399,7 @@<br></div><div> sendResult = (tlsState != NULL && tlsState->isNeeded)<br></div><div>? tlsState->write((char const*)(&data[numBytesSentSoFar]), numBytesRemainingToSend)<br></div><div>: send(socketNum, (char const*)(&data[numBytesSentSoFar]), numBytesRemainingToSend, 0/*flags*/);<br></div><div>+ makeSocketNonBlocking(socketNum);<br></div><div> if ((unsigned)sendResult != numBytesRemainingToSend) {<br></div><div>// The blocking "send()" failed, or timed out. In either case, we assume that the<br></div><div>// TCP connection has failed (or is 'hanging' indefinitely), and we stop using it<br></div><div>@@ -411,7 +412,6 @@<br></div><div>removeStreamSocket(socketNum, 0xFF);<br></div><div>return False;<br></div><div> }<br></div><div>- makeSocketNonBlocking(socketNum);<br></div><div><br></div><div> return True;<br></div><div> } else if (sendResult < 0 && envir().getErrno() != EAGAIN) {<br></div><div>```<br></div><div><br></div><div>Any chance we can get this into the official build to avoid patching hell? Thanks<br></div><div><br></div><div>Andrei<br></div><div><br></div></body></html>