Distributed Video Coding

Distributed Video Coding

With the wide deployment of mobile and wireless networks, there are a growing number of applications which do not fit the typical down-link model but rather an up-link model where many senders deliver data to a central receiver. Examples of these applications are wireless digital video cameras, low-power video sensor networks, surveillance cameras, and wireless video teleconferencing systems. Typically, these emerging applications require light encoding or a flexible distribution of the codec complexity, robustness to packet losses, high compression efficiency and, many times, also low latency/delay. There is also a growing usage of multiview video content where many (correlated) views of the same scene are available and, sometimes, communication between the cameras and joint encoding of the views is not possible or desirable. Of course, it would be great to find a video coding solution that could address these requirements at the same coding efficiency as the best hybrid coding solution, and with the same encoder complexity and error robustness of intra coding (this means simple encoding and no error propagation due to the absence of the prediction loop). To address the emerging needs, the video coding problem may be revisited at the light of an Information Theory theorem of the 70s: the Slepian-Wolf Theorem. This theorem addresses the case where two statistically dependent signals X and Y , although correlated, are independently encoded, and not jointly encoded as in the largely deployed hybrid coding solution. Surprisingly, the theorem says that the minimum rate to encode the two (correlated) sources is the same as the minimum rate for joint encoding, with an arbitrarily small probability of error, assuming that the two sources have certain statistical characteristics. This is a very interesting result in the context of the emerging challenges previously mentioned because it opens the doors to a new coding paradigm where, at least in theory, separate encoding does not induce any compression efficiency loss when compared to the joint encoding paradigm.

However, the Slepian-Wolf Theorem speaks about lossless coding and this is not the most useful case in practical video coding solutions. Fortunately, in 1976, A. Wyner and J. Ziv derived the so-called Wyner-Ziv Theorem which states that when performing independent encoding there is no coding efficiency loss with respect to the case when joint encoding is performed, under certain conditions, even if the coding process is lossy (and not lossless anymore). Together, the Slepian-Wolf and the Wyner-Ziv theorems suggest that it is possible to compress two statistically dependent signals in a distributed way (separate encoding, jointly decoding) approaching the coding efficiency of more conventional predictive coding schemes (joint encoding and decoding). Schemes that are based on these theorems are generally referred as distributed coding solutions. Since the new coding paradigm does not rely on joint encoding and thus also not on the temporal prediction loop typical of traditional coding schemes, distributed coding architectures may provide several functional benefits which are rather important for many emerging applications: i) flexible allocation of the global video codec complexity; ii) improved error resilience; iii) codec independent scalability; and iv) exploitation of multiview correlation.

While theory states that DVC solutions may be as efficient as joint encoding solutions, practical developments are still rather far from that performance, especially if low complexity encoding is also targeted. Time is now for research on better practical DVC solutions … One of the most adopted DVC video coding architectures has been developed at Stanford University in California, USA. This architecture has also been adopted for the DVC video codec developed by Instituto Superior Técnico (IST) in the context of VISNET I, and DISCOVER, and currently under improvement in VISNET II. This architecture considers two main coding streams: one addresses the so-called key frames, typically adopting a conventional coding solution like AVC, while the other accommodates the remaining frames using a DVC coding approach. Major research topics to improve the current coding efficiency of DVC solutions regard: i) the improvement of the frame interpolation process to create better side information and thus better estimations for the WZ frames to encoder; ii) the development of better correlation noise modeling in order the channel decoder, for the VISNET DVC codec a turbo codec, must know and exploit better the correlation between the WZ frames to encode and their estimations made at the decoder; and iii) the design of codes more targeted to the coding problem at hand which is not a typical channel coding problem. While for most people, DVC is simply the hottest video coding research topic around without much relation to the conventional AVC standard, it is interesting to note that the Stanford DVC architecture, and thus the VISNET DVC codec, provides in practice some degree of AVC backward compatibility (with a lower temporal resolution) by means of the key frames bitstream. Other DVC architectures provide full temporal resolution backward compatibility since DVC is used on top of a compliant AVC bitstream to provide spatial and quality scalability. So, while DVC is from a different coding family, there are many relations with AVC since also many AVC tools can be now used at the DVC decoder paradigm.

Early Wyner-Ziv Video Coding Architectures

Stanford Wyner-Ziv Video Coding Architecture
Berkeley Wyner-Ziv Video Coding Architecture

The VISNET II Wyner-Ziv Video Codec

VISNET II Wyner-Ziv Video Coding Architecture
Slepian-Wolf Codec
Side Information Creation Process
Test Conditions and Benchmarks
Performance

VISNET II Contributions to Distributed Video Coding

Video Encoder Tools
Video Decoder Tools
Video Encoder and Decoder Tools
Beyond Coding Tools