[Live-devel] Presentation timestamp - why wall clock? SR handling question too.

Roland roland at wingmanteam.com
Thu Jul 24 18:35:00 PDT 2008


Hi there

I'm working on a RTSP project, using live.2008.07.24.tar.gz.
	
I'm receiving MPEG4ESVideo and MPEG4LATMAudio data streams. I've implemented
two sinks right behind these media sources and store the incoming frames for
playback, based on the 'presentationTime' given in _afterFrame().

The very first frames on each subsession always come back with 'wall clock'
presentationTime (as filled in by code in RTPSource.cpp, lines 309-318).
Then, once a SR packet has arrived, the presentationTime time jumps to the
NTP time advertised by the source in the SR packet.

I don't think this is correct behavior. My interpretation of the SR NTP
information is that it can be used to synchronize *multiple* streams in
respect to *each other*. It does *not* mean that the given NTP timestamp
*is* the presentationTime for the given rtpTimestamp. The NTP timestamp is
defined as the sender's wall clock when the SR packet got 'sent', not when
the data contained at those rtp timestamps should be 'presented'.

So, instead of overwriting the running presentation time (stored in
RTPReceptionStats ::fSyncTime), the SR NTP time should be stored somewhere
else (along with the SR RTP timestamp) and used to provide a delay
measurement between subsessions.

The main reason for this is what kind of SR reports I'm getting from
YouTube:

At time NTP+0.0 seconds, I get an RTP timestamp of 42.0 seconds.
At time NTP+8.0 seconds, I get an RTP timestamp of 54.0 seconds.
At time NTP+16.0 seconds, I get an RTP timestamp of 62.0 seconds.

This shows that in RTP time, 12 seconds worth of data have been transmitted,
however in real time (NTP time), only 8 seconds have elapsed. The NTP time
cannot be used to calculate the presentation time of the data (12 seconds)
worth of it, without severely backtracking by -4 seconds once the second SR
packet arrives.

Instead, the SR packets of Audio/Video should be compared against, and a
delay/drift can be calculated, given enough SR packets for various
subsessions:

SR for Audio: NTP+0.0: Audio RTP at 42.0
SR for Video: NTP+0.1: Video RTP at 85.0
-> not enough data
SR for Audio: NTP+8.0: Audio RTP at 54.0 (12.0 seconds elapsed)
SR for Video: NTP+8.1: Video RTP at 97.1 (12.1 seconds elapsed)
-> Video is drifting by 0.1 seconds ahead, slow down playback
SR for Audio: NTP+16.0: Audio RTP at 62.0 (8.0 seconds elapsed)
SR for Video: NTP+16.1: Video RTP at 105.0 (7.9 seconds elapsed)
-> Video is drifting by 0.1 seconds behind, accelerate playback

Does that make sense?

Thanks for reading this far
Roland




More information about the live-devel mailing list