[Live-devel] Presentation timestamp - why wall clock? SR handling question too.

Fri Jul 25 12:32:50 PDT 2008

Hi Ross

>I really wish people would stop thinking that they need to do their 
>own implementation of the RTP/RTCP protocol (e.g., look at RTP 

I'm thinking exactly the opposite, I'd love to use and contribute to
a open source library that takes care of all the intricacies of 
RTP/RTCP. I'm just trying to make sense of the data that is being
transmitted. I came to my conclusion after reading the paragraphs
describing the SR reports in "RTP", Colin Perkins, 2003. Him, as
well as RFC3550 (Section 6.4.1) refrain from associating the NTP
timestamps with PresentationTime of payload... Initially, I was
associating the two as well, but seeing the data I'm seeing, I'm
trying to understand exactly what is going on under the hood and
find a possible explanation...

>Yes it is.  The key thing to realize is that the first few 
>presentation times - before RTCP synchronization occurs - are just 
>'guesses' made by the receiving code (based on the receiver's 'wall 
>clock' and the RTP timestamp).  However, once RTCP synchronization 
>occurs, all subsequent presentation times will be accurate, and will 
>be THE SAME PRESENTATION TIMES that the server generated (i.e., they 
>will be times that were computed from the server's clock).

>All this means is that a receiver should be prepared for the fact 
>that the first few presentation times (until RTCP synchronization 
>starts) will not be accurate.  The code, however, can check this by 
>calling "RTPSource:: hasBeenSynchronizedUsingRTCP()".  If this 
>returns False, then the presentation times are not accurate, and 
>should be treated with 'a grain of salt'.  However, once the call to 
>returns True, then the presentation times (from then on) will be 
>accurate.

My mistake here, apologies, I missed the function
'hasBeenSynchronizedUsingRTCP()' - that takes care of trusting the 
presentationTime parameter given in _afterFrame(). I still have an
issue though...

>All this means is the server is (apparently) streaming 20 seconds 
>worth of data in 16 seconds, apparently to allow the client to 
>pre-buffer the excess data (so it can ensure smooth playout).  This 
>means, therefore, that your receiving client needs to buffer this 
>extra data, and play out each frame based on the *presentation time*, 
>*not* at the time at which the frame actually arrives.

>Therefore, to use your example, you would:
>- play the frame whose presentation time is 42.0 at time 0
>- play the frame whose presentation time is 54.0 at time 12
>- play the frame whose presentation tme is 62 at time 20
>*regardless* of the times at which these frames actually arrived.

This is exactly where my trouble is. I'm *not* measuring the time when
I *receive* packets, I fully trust the 'presentationTime' parameter
given when _afterFrame() is being called. Let me expand my example
to illustrate the problem:

* Type is the type of packet received (PL = RTP payload, SR = Sender
  Report)
* ClientTime is my wall clock
* RTPTS is the timestamp value in a given UDP packet (for simplicity
  let's assume a media frequency of 10 samples per second and a frame
  contains 2 samples worth of payload)
* PresTS is the presentationTime value I receive as a parameter
  in _afterFrame
* Sync is whether the stream has been synchronized (using the newly
  discovered function)

For a client/player point of view (that's me), the only datapoints 
visible are 'PresTS' and 'Sync'. I gathered the other pieces of 
data using Wireshark.

Type ClientTime RTPTS  PresTS Sync
PL   100.0      5000   100.0  No
SR   100.1                         NTP=200.1, RTP=5001   (1)
PL   100.4      5004   200.4  Yes
PL   100.6      5006   200.6  Yes
...
PL   111.8      5118   211.8  Yes
PL   112.0      5120   212.0  Yes
SR   112.1                         NTP=208.1!, RTP=5121  (2)
PL   112.2      5122  >208.2< Yes                        (3)
PL   112.4      5124   208.4  Yes
...
PL   120.0      5200   216.0  Yes
SR   120.1                         NTP=216.1, RTP=5201   (4)
PL   120.2      5202   216.2  Yes
...

(1) PresTS jumps due to an incoming SR packet. I thought this was
    an issue, but now I have the means to take the previous '100.0'
    value with a grain of salt.
(2) This is the one that freaks me out. The SR packet causes the
    PresTS times to go backwards in subsequent frames. If you
    compare the times in the NTP domain, 8 seconds have elapsed.
    But I get this report only after 12 seconds of ClientTime and
    12 seconds worth of payload.
(3) Now I'm utterly confused. I get frames, which should have been
    presented 4 seconds ago...What do I do with the fact that time
    went backwards within a fully synchronized stream? If it were
    just fractions of a second, that's one thing. But 4 seconds?
    (That's 60 video frames and 88200 audio samples in my case).
    Looking at the actual data, there is no jumping back in time.
(4) After a while, another SR reports comes in. Compared to the SR
    report at (2), this one is correctly spaced in NTP as well as RTP.
    Hence subsequent calculations will give me a continuous PresTS.

My dilemma is at (3). What does a client do if PresentationTime goes
backwards by this much? I've been trying to come up with all sorts
of explanations and the only ones that makes sense to me is that NTP
time of SR packets is not necessarily PresentationTime (sustained by
wording in my book and RFC), the other one that the server 
implementation is buggy (sustained by the irrationality of the data
seen and the current implementation of live555)...

Yours truly confused
Roland

PS:
Quoting RFC3550:
NTP timestamp: 64 bits
      Indicates the wallclock time (see Section 4) when this report was
      sent so that it may be used in combination with timestamps
      returned in reception reports from other receivers to measure
      round-trip propagation to those receivers. .