[Live-devel] Presentation timestamp - why wall clock? SR handling question too.

Thu Jul 24 19:10:07 PDT 2008

>The very first frames on each subsession always come back with 'wall clock'
>presentationTime (as filled in by code in RTPSource.cpp, lines 309-318).
>Then, once a SR packet has arrived, the presentationTime time jumps to the
>NTP time advertised by the source in the SR packet.
>
>I don't think this is correct behavior.

Yes it is.  The key thing to realize is that the first few 
presentation times - before RTCP synchronization occurs - are just 
'guesses' made by the receiving code (based on the receiver's 'wall 
clock' and the RTP timestamp).  However, once RTCP synchronization 
occurs, all subsequent presentation times will be accurate, and will 
be THE SAME PRESENTATION TIMES that the server generated (i.e., they 
will be times that were computed from the server's clock).

All this means is that a receiver should be prepared for the fact 
that the first few presentation times (until RTCP synchronization 
starts) will not be accurate.  The code, however, can check this by 
calling "RTPSource:: hasBeenSynchronizedUsingRTCP()".  If this 
returns False, then the presentation times are not accurate, and 
should be treated with 'a grain of salt'.  However, once the call to 
returns True, then the presentation times (from then on) will be 
accurate.

>The main reason for this is what kind of SR reports I'm getting from
>YouTube:
>
>At time NTP+0.0 seconds, I get an RTP timestamp of 42.0 seconds.
>At time NTP+8.0 seconds, I get an RTP timestamp of 54.0 seconds.
>At time NTP+16.0 seconds, I get an RTP timestamp of 62.0 seconds.
>
>This shows that in RTP time, 12 seconds worth of data have been transmitted,
>however in real time (NTP time), only 8 seconds have elapsed.

All this means is the server is (apparently) streaming 20 seconds 
worth of data in 16 seconds, apparently to allow the client to 
pre-buffer the excess data (so it can ensure smooth playout).  This 
means, therefore, that your receiving client needs to buffer this 
extra data, and play out each frame based on the *presentation time*, 
*not* at the time at which the frame actually arrives.

Therefore, to use your example, you would:
- play the frame whose presentation time is 42.0 at time 0
- play the frame whose presentation time is 54.0 at time 12
- play the frame whose presentation tme is 62 at time 20
*regardless* of the times at which these frames actually arrived.

I really wish people would stop thinking that they need to do their 
own implementation of the RTP/RTCP protocol (e.g., look at RTP 
timestamps or sequence numbers, and/or RTCP reports).  You don't - we 
already implement all of this!  All you need to do is use the 
presentation times that are delivered to you (but be aware that the 
first few presentation times may not be accurate, as noted above).
-- 

Ross Finlayson
Live Networks, Inc.
http://www.live555.com/