Scalable Real-Time Internet Video Transmission

I Introduction

A framework for transporting real-time Internet video consists of two basic components, congestion control and error control. Specially, congestion control consists of rate control, rate-adaptive encoding, and rate shaping; error control consists of forward error correction (FEC), retransmission, error resilience, and error concealment. There exists a design space which can be explored by video application designers.

Scalability is another tool which is aimed at overcoming the wide changing of channel rate and improving the robustness of video in the error-prone channel environment. Generally speaking, scalability consists of temporal scalability, spatial scalability and SNR scalability. Here we are focus on SNR scalability only.

Rate control attempts to minimize network congestion and the amount of packet loss by matching the rate of the video stream to the available network bandwidth. Note that rate control is from the transport perspective, while rate-adaptive video encoding is from the compression perspective; rate shaping is in both transport and compression domain.

There are three kinds of FEC: 1)channel coding; 2)source coding-based FEC; and 3)joint source/channel coding. The use of FEC is primarily because of its advantage of small transmission delay. But FEC could be ineffective when bursty packet loss occurs and such loss exceeds the recovery capability of the FEC codes.

Conventional retransmission-based schemes such as automatic repeat request (ARQ) are usually dismissed as a means for transporting real-time video since the delay requirement may not be met.

Error-resilient schemes deal with packet loss on the compression layer. Unlike traditional FEC, which directly corrects bit errors or packet losses, error-resilient schemes consider the semantic meaning of the compression layer and attempt to limit the scope of damage (caused by packet loss) on the compression layer. As a result, error-resilient schemes could reconstruct the video picture with gracefully degraded quality.

Error-concealment is a post-processing technique used by the decoder.

Note that channel coding and retransmission recover packet loss form the transport perspective, while source coding-based FEC, error resilience and error concealment deal with packet loss from the compression perspective; joint source/channel coding falls in both the transport and compression domain.

Since this project is mainly focus on the error control schemes for scalable video transmission over Internet, rate control will not be discussed in details here.

II Scalable Video

All the current video compression standard, MPEG-2, H.263++, and MPEG-4 provide tools to support SNR scalability, e.g, H.263++ Annex O, and MPEG-4 FGS (Fine Gradual Scalability).

III Error Control

(A). FEC

In the Internet, packets may be dropped due to congestion at routers, they may be misordered, or they may reach the destination with such a long delay as to be considered useless or lost.

FEC, retransmission, and error resilience are performed at both the source and the receiver side, while error concealment is carried out only at the receiver side. FEC schemes can be classified into three categories: 1) channel coding; 2) source coding-based FEC; and 3) joint source/channel coding.

For Internet applications, channel coding is typically used in terms of block codes. Specifically, a video stream is first chopped into segments, each of which is packetized into k packets; then for each segment, a block code (e.g., Reed-Salamon codes and the recently proposed Tornado code) is applied to the k packets to generate a n-packet block, where n > k. Due to its ability to recover from any k out of n packets regardless of which packets are lost, it allows the network and receivers to discard some of the packets which cannot be handled due to limited bandwidth or processing power. Thus, it is also applicable to heterogeneous networks and receivers with different capabilities. There are also some disadvantages:

To provide error recovery in layered multicast video, a receiver-driven hierarchical FEC (HFEC) was proposed. In HFEC, additional streams with only FEC redundant information are generated along with the video layers. Each of the FEC streams is used for recovery of a different video layer, and each of the FEC streams is sent to a different multicast group. Subscribing to more FEC groups corresponds to higher level of protection. Like other receiver-driven schemes, HFEC also achieves good tradeoff between flexibility of providing recovery and bandwidth efficiency.

Source coding-based FEC (SFEC) is a recently devised variant of FEC for Internet video. SFEC could add redundant information as follows: the nth packet contains the nth group of blocks (GOB) and redundant information about the (n-1)th GOB . However, the reconstructed (n-1)th GOB has a coarser quality. This is because the redundant information about the (n-1)th GOB is a compressed version of the (n-1)th GOB with a larger quantizer, resulting in less redundancy added to the nth packet. The redundant information added by SFEC is more compressed versions of the raw video. As a result, when there is packet loss, channel coding could achieve perfect recovery while SFEC recovers the video with reduced quality. One advantage of SFEC over channel coding is lower delay (no need to wait k packets).

The motivation of joint source/channel coding for video comes from the following observations. A) According to the rate-distortion theory, the lower the source-encoding rate R for a video unit, the larger the distortion D of the video unit. B) Suppose the total rate (coding rate R plus the channel-coding redundancy rate R') is fixed and channel loss characteristics do not change. The higher R leads to the lower R'. This leads a higher probability Pc of the event that the video unit gets corrupted, which leads to a higher D. The objective is to find the optimal rate-allocation point.

(B). Error Resilience

Error-resilient schemes address loss recovery from the compression perspective. They attempt to prevent error propagation or limit the scope of the damage (caused by packet losses) on the compression layer. The standardized error-resilient tools include resynchronization marking, data partitioning, and data recovery (e.g. RVLC) are targeted at error-prone environments like wireless channels and may not be applicable to the Internet. For Internet video, the boundary of a packet already provides a synchronization point in the variable-length coded bitstream at the receiver side. On the other hand, since a packet loss may cause the loss of all motion data and its associated shape/texture data, the above mechanisms may not be useful for Internet video applications.

1) Optimal Mode Selection: There are two coding modes for block - INTRA mode and INTER mode (SKIP is a special case of INTER mode). Constantly referring to previously coded blocks has the danger of error propagation. By occasionally turning off the INTER mode, error propagation can be limited. But it will be more costly in bits to code a block all by itself. Therefore, there is a tradeoff in selecting a coding mode for each block.

The classical R-D optimized mode selection can not achieve global optimality under the error-prone environment since it does not consider the network congestion status and the receiver behavior. By identifying three factors: 1) the source behavior (e.g. quantization and packetization); 2) the path characteristics; 3) the receiver behavior (e.g. error concealment), an end-to-end approach to R-D optimized mode selection was proposed.

2) Multiple Description Coding: With MDC, a raw video sequence is compressed into multiple streams (referred to as descriptions). Each description provides acceptable visual quality; more combined descriptions provide a better visual quality. The advantages of MDC are: 1) robustness to loss: even if a receiver gets only one description; 2) enhanced quality: if a receiver gets multiple descriptions, it can combine them together to produce a better reconstruction. , but this will reduce the compression efficiency compared to the conventional single description coding (SDC).

(C). Error Concealment

There are two basic approaches for error concealment, namely, spatial and temporal interpolation. In spatial interpolation, missing pixel values are reconstructed using neighboring spatial information, whereas in temporal interpolation, the lost data is reconstructed from data in the previous frames. Typically, spatial interpolation is used to reconstruct missing data in intracoded frames while temporal interpolation is used to reconstruct missing data in intercoded frames. Three simple error-concealment (EC) schemes:

This page was originally from http://www.cmlab.csie.ntu.edu.tw/~pkhsiao/ and then modified.

Last updated Oct. 2 , 2002