What are 4:2:2, 4:1:1,
and 4:2:0?
These are
all shorthand notations for different sampling
structures for digital video. They are also used
for CIF and QSIF and suchlike MPEG frame sizes,
but in the discussion that follows, I focus on
the numbers for SDTV (standard-definition TV)
digitized to the ITU-R-BT601 standards: 13.5 MHz
sample frequency and 720 pixels per line.
The first
number refers to the 13.5 MHz sampling rate of
the luminance: "4" because (a) it's
nominally almost approximately sort of four times
the NTSC and/or PAL color subcarrier frequencies,
and (b) because if it's "4" the other
numbers can be integers whereas if it were
"1" the formats would be
"1:0.5:0.5", "1:0.25:0.25",
and "1:0.5:0" respectively, and which
would you rather try to read off in a hurry? The
13.5 MHz sampling yields 720 pixels per scanline
in both 525/59.94 and 625/50 systems (NTSC and
PAL/SECAM). This number applies to D-1, D-5,
Digital Betacam, BetaSX, Digital-S, and all the
DV formats just the same.
The other
two numbers refer to the sampling rates of the
color difference signals R-Y and B-Y.
In 4:2:2
systems (D-1, D-5, DigiBeta, BetaSX, Digital-S,
DVCPRO50) the color is sampled at half the rate
of the luminance, with both color-difference
samples co-sited (located at the same place) as
the alternate luminance samples. Thus you have
360 color samples (in each of R-Y and B-Y) per
scanline.
In 4:1:1
systems (NTSC DV & DVCAM, DVCPRO) the color
data are sampled half as frequently as in 4:2:2,
resulting in 180 color samples per scanline. The
U and V samples are considered to be co-sited
with every fourth luminance sample. Yes, this
sounds horrible -- but it's still enough for a
color bandwidth extending to around 1.5 MHz,
about the same color bandwidth as Betacam SP
(which, were it a digital format, would be
characterized as a 3:1:1 format).
So where
does 4:2:0 (PAL DV, DVD, main-profile
MPEG-2) fit in? 4 x Y, 2 x R-Y, and 0 x B-Y?
Fortunately not! 4:2:0 is the non-intuitive
notation for half-luminance-rate sampling of
color in both the horizontal and vertical
dimensions. Chroma is sampled 360 times per line,
but only on every other line. The theory here is
that by evenly subsampling chroma in both H and V
dimensions, you get a better image than the
seemingly unbalanced 4:1:1, where the vertical
color resolution appears to be four times the
horizontal color resolution. Alas, it ain't so:
ITU-R-601 pixels are taller than they are wide,
and interlace already diminishes vertical
resolution; as a result, multigeneration work in
4:2:0 is much more subject to visible degradation
than multigeneration work in 4:1:1.
"Now
how much would you pay? But wait, there's
more!" In US implementations of 4:2:0, the
color samples are supposed to be vertically
interleaved with luminance, whereas in European
4:2:0 they're supposed to be co-sited.
Practically speaking, this is a headache for
developers of codecs, encoders, and DVEs, but for
DV purposes it's not especially exciting, since
only European DV is 4:2:0.
|