[Live-devel] Layered video with Live555

Sat Mar 3 16:10:28 PST 2007

Ross Finlayson wrote:
>> For an academic demonstration, I'm planning on extending Live555 to
>> support RTP transport of scalable H.264 video and was hoping someone
>> with a reasonable amount of experience with Live555 could help steer me
>> in the direction of least pain ...
>>
>> Basically, I'll be using the reference codec for H.264 SVC (currently in
>> development) to generate a file containing H.264 NAL units.  The
>> important difference between the output of this codec and a standard
>> H.264 stream is the addition of two NAL unit types (20 & 21), which
>> carry information about which layer of video is described in the
>> preceding/current NAL unit.  For now, assume I know how to parse this
>> file and determine which NAL units belong to which layers.  My intention
>> is to send each layer out either multiplexed in the same RTP stream (the
>> easy way) or in separate RTP streams (the hard / interesting way),
>> according to this draft RFC:
>> http://www.ietf.org/internet-drafts/draft-ietf-avt-rtp-svc-00.txt
>>     
>
> This is interesting.  I suggest proceeding in three steps (with each 
> step requiring additional work building on the previous steps):
> 1/ Stream regular (non-SVC) H.264 video from a file.  You will be 
> able to test this using VLC.
> 2/ Add additional SVC layers, multiplexed in the same RTP stream as 
> the base layer.
> 3/ Use separate RTP streams for separate SVC layers.
>
> If you're streaming on a single RTP stream (steps 1/ or 2/), then 
> it's fairly straightforward: You'll need to write your own subclass 
> of "H264VideoStreamFramer"; that subclass will parse the input stream 
> (from a "ByteStreamFileSource").  You'll then 'play' this to a 
> "H264VideoRTPSink" object
Ok, I think I have the StreamFramer class basically working except for 
one small problem.  To parse the file, I created a subclass of 
MPEGVideoStreamParser (purely out of convenience), and defined a parse() 
routine that has two states: PARSING_START_SEQUENCE and 
PARSING_NAL_UNIT.  The parser basically alternates between these two 
states either throwing data out (to find the first sequence), or saving 
it (until it finds the next).

So naturally, there is no start sequence at the end of the file.  It 
just sorta ends.  So what I'm seeing when I play from my Framer source 
class to a H264VideoFileSink class is that all the NAL units are copied 
over to the output file except the last one.
  What I think is happening is, for the last NAL unit, my call to 
test4Bytes() is throwing an exception once it gets to the end of the 
file .. causing parse() to return 0.  Meanwhile, the StreamParser class 
goes off and tries to read more from the file, sees that the file is at 
EOF, and closes the file, etc.

I peeked around at some of the mechanisms for handling what to do when a 
stream gets closed, thinking this would afford me the opportunity to 
tell my stream parser to give me what it has left it its buffer, but I 
haven't been able to wrap my mind around it completely yet.  Do you 
think this is the Right Way to do it?  Any other suggestions?

Below is the code for my parse routines .. there's not much to 'em.

- Tim

- snip -

void H264JSVMVideoStreamParser :: parseStartSequence()
{
    // Find start sequence (0001)
    u_int32_t test = test4Bytes();
    while (test != 0x00000001)
    {
        skipBytes(1);
        test = test4Bytes();
    }
    setParseState(PARSING_NAL_UNIT);
    skipBytes(4);
}

unsigned H264JSVMVideoStreamParser :: parseNALUnit()
{
    // Find next start sequence (0001) or end of stream
    u_int32_t test = test4Bytes();
    while (test != 0x00000001)
    {
        saveByte(get1Byte());
        test = test4Bytes();
    }
    setParseState(PARSING_START_SEQUENCE);

    return curFrameSize();
}