MPEG video compression technique

现在的位置: 首页 > 综合 > 正文

MPEG video compression technique

2018年04月03日 ⁄ 综合 ⁄ 共 10010字 ⁄ 字号小中大 ⁄ 评论关闭

From: http://vsr.informatik.tu-chemnitz.de/~jan/MPEG/HTML/mpeg_tech.html

a brief discussion

The MPEG compression technique is described here as long it is necessary to understand the Java implementation problems.

A MPEG "film" is a sequence of three kinds of frames:

The I-frames are intra coded, i.e. they can be reconstructed without any reference to other frames. The P-frames are forward predicted from the last I-frame or P-frame,
i.e. it is impossible to reconstruct them without the data of another frame (I or P). The B-frames are both, forward predicted and backward predicted from the last/next I-frame or P-frame,
i.e. there are two other frames necessary to reconstruct them. P-frames and B-frames are referred to as inter coded frames.

That means a Java program must buffer at least three frames: One for forward prediction and one for backward prediction. The third buffer contains the frame coming into being. As the figure shows the frame for backward prediction
follows the predicted frame. That would require to suspend the decoding of B-frames till the next P- or B- frame appears. But fortunately the display order is not the coding order. The frames appear on MPEG data stream in such an order that the referred frames
precede the referring frames.

As an example the frame sequence above is transfered in the following order: I P B B B P B B B. The only task of the decoder is to reorder the reconstructed frames. To support this an ascending
frame number comes with each frame (modulo 1024).

What does "prediction" mean?

Imagine an I-frame showing a triangle on white background! A following P-frame shows the same triangle but at another position. Prediction means to supply a motion vector which declares
how to move the triangle on I-frame to obtain the triangle in P-frame. This motion vector is part of the MPEG stream and it is divided in a horizontal and a vertical part. These parts can be positive or negative. A positive value means motion
to the right ormotion downwards, respectively. A negative value means motion to the left or motion upwards, respectively.

The parts of the motion vector are in an range of -64 ... +63. So the referred area can be up to 64x64 pixels away.

But this model assumes that every change between frames can be expressed as a simple displacement of pixels. But the figure to the right shows this isn't true. The red rectangle is shifted and rotated by 5° to the right. So a simple displacement
of the red rectangle will cause a prediction error. Therefore the MPEG stream contains a matrix for compensating this prediction error.

Thus, the reconstruction of inter coded frames goes ahead in two steps:

Application of the motion vector to the referred frame;
Adding the prediction error compensation to the result;

Note that the prediction error compensation requires less bytes than the whole frame because the white parts are zero and can be discarded from MPEG stream. Furthermore the DCT compression (see
later in this chapter) is applied to theprediction error which decreases its memory size.

Note also the different meanings of the two + - signs. The first means adding the motion vector to the x-, y- coordinates of each pixel. The second means adding an
error value to the color value of the appropriate pixel.

But what if some parts move to the left and others to the right ?

The motion vector isn't valid for the whole frame. Instead of this the frame is divided into macro blocks of 16x16 pixels. Every macro block has its own motion vector. Of course, this does not avoid contradictory motion but it minimizes
its probability.

And if contradictory motion occurs? One of the greatest misunderstandings of the MPEG compression technique is to assume that all macro blocks of P-frames are predicted. If the prediction error is
to big the coder can decide to intra-code a macro block. Similarly the macro blocks in B-frames can be forward predicted or backward predicted or forward and backward predicted or intra-coded.

Every macro block contains 4 luminance blocks and 2 chrominance blocks. Every block has a dimension of 8x8 values. The luminance blocks contain information of the brightness of every pixel in macro block. The chrominance
blocks contain color information. Because of some properties of the human eye it isn't necessary to give color information for every pixel. Instead 4 pixels are related to one color value. This color value is divided into two parts. The first is in C_b color
block the second is in C_r color block. The color information is to be applied as shown in the picture to the left.

Depending on the kind of macro block the blocks contain pixel information or prediction error information as mentioned above. In any case the information is compressed using the discrete cosine transform (DCT).

The discrete cosine transform(DCT)

As mentioned above the 8x8 block values are coded by means of the discrete cosine transform. To explain this regard the figure to the right. This shall be an enlargement of an 8x8 area of a black-white frame.

120	108	90	75	69	73	82	89
127	115	97	81	75	79	88	95
134	122	105	89	83	87	96	103
137	125	107	92	86	90	99	106
131	119	101	86	80	83	93	100
117	105	87	72	65	69	78	85
100	88	70	55	49	53	62	69
89	77	59	44	38	42	51	58

The normal way is to determine the brightness of each of the 64 pixels and to scale them to some limits, say from 0 to 255^*, whereby "0" means "black" and "255" means "white". We can also represent the values in form of an 8x8 bar diagram.
Normally the values are processed line by line. That requires 64 byte of storage.

^*(In MPEG a range from -256 to 255 is used.)

But you can define all the 64 values by only 5 integers if you apply the following formula called discrete cosine transform (DCT)

Where f(x,y) is the brightness of the pixel at position [x,y]. The result is F an 8x8 array, too:

700	90	100
90	0	0
-89	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0

But as you can see: almost all values are equal to zero. Because the non-zero values are concentrated at the upper left corner the matrix is transfered to the receiver in zigzag scan order. That would result in:
700 90 90 -89 0 100 0 0 0 .... 0
Of course, the zeros are not transferred. An End-Of-Block sign is coded instead.

The decoder can reconstruct the pixel values by the following formula called inverse discrete cosine transform (IDCT):

Where F(u,v) is the transform matrix value at position [u,v]. The results are exactly the original pixel values. Therefore the MPEG compression could be regarded as loss-less. But that
isn't true, because the transformed values are quantized. That means they are (integer) divided by a certain value greater or equal 8 because the DCT supplies values up to 2047. To reduce them under the byte length at least the quantization value 8 is applied.
The decoder multiplies the result by the same value. Of course the result differs from the original value. But again because of some properties of the human eye the error isn't visible. In MPEG there is a quantization matrix which defines a different quantization
value for every transform value depending on its position.

Was it a well chosen example ?

No, it wasn't! The DCT always tends to compute zeros. This effect is assisted by the quantization which zeros small values. To understand this one must recognize the essence of the DCT. To do this let's begin from
the opposite side.

Let us apply the IDCT to a matrix only containing one value of 700 at the upper left corner:

The result of the IDCT is

And the bar diagram looks like

700	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0

87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87
87	87	87	87	87	87	87	87

Of course, the picture is an grey colored square. The value at the upper left corner is called the DC value. This is the abbreviation for direct current and refers to a similar phenomenon in the
theory of alternating current where an alternating current can have a direct component. In DCT the DC value determines the average brightness in block. All other values describe the variation around this DC value. Therefore they are sometimes referred
as to AC values (from "alternating current").

Now let's add an
AC value of 100

The result of
the IDCT is

And the bar diagram looks like

700	100	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0

105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70
105	102	97	91	84	78	73	70

The resulting picture looks like As you can see the values vary around the DC value of 87. Furthermore if you regard
the shape of the bar diagram you'll see a curve like a half cosine line. It is said the picture has a frequency of 1 in X-direction. Imagine a car that drives with constant speed from left to right along the "bar diagram street" parallel to X-Axis. In contrast
to the DC example the car is shaked with a certain frequency but only if it follows the X-Axis. In Y direction it moves along the same height during the whole way. This behaviour is documented in the transform matrix because the only AC value of 100 appears
in X direction.

Now let's consider what happens
if we place the AC value of 100
at the next position

The result of
the IDCT is

And the bar diagram looks like

700	0	100	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0

104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104
104	94	81	71	71	81	94	104

The resulting picture looks like . The shape of the bar diagram shows a cosine line, too. But now we see a full period,
i.e. the frequency is as twice as high as in the first example. This behaviour would continue if we replace the AC-value step by step to the right. Every step increases the frequency of the cosine wave.

But what happens if we
place both AC values

The result of
the IDCT is

And the bar diagram looks like

700	100	100
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0

121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86
121	109	91	75	68	71	80	86

The resulting picture looks like .Regarding the shape of the bar diagram you can see a mix of both, the first and the second
cosine wave. Indeed, the resulting AC value is simply an addition of the cosine lines.

Now let's add a AC
value at the other direction

The result of
the IDCT is

And the bar diagram looks like

700	100	100
200	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0

156	144	125	109	102	106	114	121
151	138	120	104	97	100	109	116
141	129	110	94	87	91	99	106
128	116	97	82	75	78	86	93
114	102	84	68	61	64	73	80
102	89	71	55	48	51	60	67
92	80	61	45	38	42	50	57
86	74	56	40	33	36	45	52

The resulting picture looks like Now the values vary in Y direction, too. The principle is: The higher the index of the
AC value the greater is the frequency.

Now as a last example let's place an AC value at the opposite side of the DC value. We already know what it means: The highest possible frequency of 8 is applied in both, the X- and the Y- direction.

What is to be expected?

The result of
the IDCT is

And the bar diagram looks like

950	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	500

124	105	139	95	143	98	132	114
105	157	61	187	51	176	80	132
139	61	205	17	221	32	176	98
95	187	17	239	0	221	51	143
143	51	221	0	239	17	187	95
98	176	32	221	17	205	61	139
132	80	176	51	187	61	157	105
114	132	98	143	95	139	105	124

Because of the high frequency the neighbouring values differ numerously. The picture shows a checker-like appearance .
Note that this shall be a 8x8 pixel enlargement in a real picture! How often does it happen? We can hope that such a case is very seldom. And that's why the DCT computes in almost every case zeros for the higher frequencies.

【上篇】I,P，B帧和PTS，DTS的关系
【下篇】I帧和IDR帧区别

作者: asexual

该日志由 asexual 于6年前发表在综合分类下，最后更新于 2018年04月03日.
转载请注明: MPEG video compression technique | 学步园 +复制链接

抱歉!评论已关闭.

学步园