Assignment 02: Coding Video for Streaming
The adaptive bit-rate (ABR) streaming mechanism is used in most modern streaming protocols, such as HLS and MPEG DASH. In ABR streaming, the content is encoded at several bitrate representations. Each representation incorporates a set of defined switching points, such as MPEG GOPs with an IDR-picture. During the playback, the streaming client monitors the rate of the incoming data. If the rate becomes insufficient for continuous playback, the client may switch to a lower bitrate representation to prevent buffering. However, if the rate is greater than bitrate of the current representation, the client may switch to a higher bitrate, which will probably increase the quality of video. If the client makes perfect decisions throughout the playback, then the quality of the delivered content is maximised for the client networking environment.
The representations used for ABR streaming can differ in bitrate, resolution and CODEC configuration. These choices are called an encoding ladder. Often encoding ladders are designed to be used for all content (mostly video-picture content), client devices, and delivery networks. However, these universal ladder designs are sub-optimal, as rate-distortion characteristics vary for different types of content (e.g. high motion and low motion content) and network bandwidth for different technologies (e.g. wired, wireless) has very different characteristics.
MPEG-DASH
MPEG-DASH partitions each representation of the content into short, fixed duration segments. These representations are time-aligned so that while the content is being played back by an MPEG-DASH client, the client can use a bitrate adaptation (ABR) algorithm to select the next segment of the representation that has the highest bitrate (quality) that can be downloaded in time for playback without causing stalls or buffering.
The process of selecting of the next representation makes a prediction about the network conditions that will exist during the transfer of the next segment. To select an appropriate representation the client uses a manifest file, which describes each segment of each representation.
<Representation id="1" width="960" height="540" bandwidth="2200000" codecs="avc1.640029">...
<Representation id="2" width="1280" height="720" bandwidth="3299968" codecs="avc1.640029">...
<Representation id="3" width="640" height="360" bandwidth="800000" codecs="avc1.4D401E">...
If the predictions are to be successful, each segment of each representations must not exceed (or significantly fall short of) the advertised bitrate for its representation. To achieve this objective the encoder must employ constrained bitrate encoding techniques.
Rate Control
Rate control is the process used by the encoder in deciding how to allocate bits to encode each picture. The goal of (lossy) video coding is to reduce the bitrate while retaining as much quality as possible. Rate control is a crucial step in determining the tradeoff between size and quality.
CBR and VBR encoding sets a target data rate and a bitrate control technique is applied by the encoding application to achieve the target bitrate. It can be difficult to choose an appropriate data rate for constrained connections and the quality of experience (QoE) for viewers can be impacted if the range of VBR is too high or in the case of CBR, if the nature of the content varies greatly. Often constrained VBR between 110%-150% is used, however this assumes a target bitrate to achieve an acceptable level of quality is known before the content is encoded.
Not all video content is equally compressible. Low motion and smooth gradients compress well (few bits for high perceived quality) , whereas high motion and fine spatial detail are less compressible (more bits to
CS6114 Assignment
CS6114 Assignment
preserve quality). Often it is easier to specify a target quality and let the encoder vary the data rate to achieve this target. However, the data rate required to achieve the target quality is unknown in advance.
Constant Rate Factor (CRF) encoding specifies a quality level and the encoding application adjusts the data rate to achieve the target quality. The result is content with a fixed quality level, but the data rate is unknown in advance. If quality is the objective this is not a concern, but if the data rate varies significantly over the duration of the content, it may have implications for the deliverability.
Capped CRF applies the data rate necessary to achieve a target quality, together with a maximum data rate to ensure deliverability.
Encoding Ladders
Originally ABR streaming used a fixed encoding ladder that was either agnostic of the video content (Apple), or based on encoding ladders that worked best across a catalogue of content (Netflix). An advance on this approach is to create an encoding ladder that depends on the content type (e.g. the per-title encoding by Netflix). For an encoding ladder to be optimally designed it must model the rate distortion characteristics of the source (content-aware), and model the delivery network and client switching- algorithm (context-aware).
In this assignment only content-aware factors will be considered. For video on demand applications a model of quality for each representation (bitrate) can be created for an encoder (e.g. libx264) by encoding source content using a range of bitrates, and measuring the overall quality using an objective quality metric (e.g. PSNR). This results in pairs of values (Ri, Qi), i = 1, 2, ... where Ri denotes bitrate and Qi denotes quality.
Some encoding ladder design considerations include
• Good quality representations with reasonable bitrates
• Quality and bitrate increments between consecutive representations • Segment duration (coding efficiency versus adaptability)
• Network limits (maximum bitrates on different platforms)
Encoding Ladder for the Assignment
In this assignment the resolution (size and frame rate) of the content is fixed1. Creating a content-aware encoding ladder raises several questions that must be addressed in the assignment.
• The number of representations is finite, so how many representations are sufficient (and practical to implement)?
• What is the increase in bitrate between adjacent representations? Is this a fixed increment (e.g. 5% greater each time), or quality based (what difference is noticeable)? Are these bitrate increases equally spaced?
In this assignment the optimality criteria to consider are
• Each bitrate-resolution entry in the encoding ladder should, for the given bitrate, have as high a
quality as possible
• Adjacent bitrates should be perceptually spaced. Careful choice of the quality improvements
between representations can result in smooth quality transitions when switching. But this must be balanced against the practical concern of too many representations.
1 So you do not need to consider the difference between scaling artefacts and encoding artefacts.
CS6114 Assignment
A video encoder can be configured in many ways, such as different GOP (Group of Pictures) structures, different quantisation parameters (QP) or bit allocations. Depending on the encoder and the configuration the same source video can be compressed differently, each having its own bitrate and distortion value. To determine the list of representations to use in the encoding ladder the Bjøntegaard Delta-Rate (BD Rate) metric can be used to select the encoder configuration.
Bjøntegaard Delta-Rate
A CODEC quality comparison experiment consists of a series of encoding and quality metric calculations on different parameters giving points on a bitrate-quality graph. These measured points are used to create rate-distortion curves, as it is impractical to generate all of the points on the curve. For better visibility in rate-distortion plots, the discrete points are interpolated to give a continuous curve. The Bjøntegaard- Delta (BD) metric reduces the performance comparison to a single numerical value. The BD metric calculates the average difference between two curves by interpolating the measured points of two CODECs or CODEC features/settings.
The BD-rate is calculated on rate-distortion curves using the following procedure.
• Four different rate points or target qualities are chosen for the input sequence
• For these four points, contents is encoded with two different CODECs or CODEC configurations
• The measured bitrate and the measured distortion (e.g. PSNR) for the resulting eight encodings
are used to create rate distortion curves
• To ensure that mean BD-rate values are not biased towards higher bitrates a logarithmic scale is
used for the measured bitrates
The BD-Rate calculates the average difference between two rate distortion curves, by estimating the area between the two curves. The BD-Rate allows the measurement of the bitrate reduction offered by a CODEC or CODEC feature/setting while maintaining the same quality as measured by the objective metric.
https://github.com/FAU-LMS/bjontegaard
Task
In the assignment you will design an encoding ladder for some example content. To identify the most suitable entries in the encoding ladder you will use the information you learned from You will use the information from applying the Bjøntegaard-Delta (BD) metric.
In the assignment you will use two GoP structures as the different configurations of a CODEC to compare
• GoP length 100, number of B-pictures 3
• GoP length 250, number of B-pictures 3
You will need to
• Decide the values of the four rate points or target qualities – use a defined CRF value appropriate for low bitrate, medium, good and excellent quality content
• Encode the content using these CRF values and measure the bitrate and quality, giving a total of 8 encodings
• Calculate the BD-Rate and BD-PSNR using the bjontegaard Python package, this will identify the difference, if any, between these configurations
• Create the rate distortion curve (quality versus bitrate) for the selected CODEC configuration
• Choose an appropriate number of bitrates from the curve that capture low, medium and high
quality encodings – these are the entries in the encoding ladder
• Encode the content using capped CRF encoding
There is no requirement to create an MPEG-DASH manifest file.
Create a Jupyter notebook that implements this workflow. Write a short report (2 pages) that interprets your results, justifies your choices and includes any observations or improvements you noted or implemented.
Resources
There are test video sequence. The supplied Jupyter notebook (A02) gives an example of creating the encoding structure, and extracting the data for use with the bjontegaard Python package.
References
G. Bjøntegaard, “Calculation of average PSNR differences between RD curves,” document, VCEG-M33, Austin, TX, USA, Apr. 2001.
A. V. Katsenou, J. Sole and D. R. Bull, "Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming," in IEEE Open Journal of Signal Processing, vol. 2, pp. 496-511, 2021, doi: 10.1109/OJSP.2021.3086691.
Valery Zimichev , BD-rate: one name - two metrics. AOM vs. the World. https://vicuesoft.com/blog/titles/bd_rate_one_name_two_metrics/
CS6114 Assignment
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp