Monday, May 30, 2005

The Definition of Definition?

What is the definition of "definition," as in "high-definition"? Is it simply how-many-pixels by how-many-pixels the transmitted signal provides, or is there more to it than that?

A 1080i signal has exactly 1,920 pixels across the screen horizontally, while there are exactly 1,080 pixels up and down the screen vertically. Each pixel of the frame is refreshed (updated) once every 1/30 second — though, actually, half (those on the odd-numbered scan lines) are refreshed in the first 1/60 second, for one field, and the other half (on the even-numbered lines) are done to make the second 1/60-second field. If every pixel is (spatially or temporally) distinct from every adjacent pixel, you get the maximal resolution or definition.

The same theory applies to 720p, except the pixel grid is 1,280 pixels across by 720 pixels up and down, while every pixel is refreshed once every 1/60 second. Notice that in both 1080i and 720p the pixels are square. The pixels used for encoding digital video on DVD are oblong.

So a plethora of pixels that are spatially or temporally distinct is the essence of high definition. Still, for several reasons, there's more to it than that. There are several things that stand in the way of getting maximal theoretical definition ... and some of them can even be considered good.


For example, a 1080i signal is vertically filtered. About 30-40 percent of the potential vertical resolution is filtered away in order to avoid interlace artifacts such as details flickering on and off as they rise slowly in the picture.

This happens because the two interlaced "halves" of the picture are offset slightly in time. It is even possible for tiny details that are moving upward (or downward) at just the right rate to be completely missed by the interlaced scanning of the image, if they happen to always fall on one of the "missing" scan lines in each 540-line field. But if the rate of ascent or descent is slightly different, these small details will blink.

Or, a completely stationary detail that is present in one of the two fields but not in the other will also blink, in what is referred to as "twitter."

The vertical filtering of 1080i to eliminate the blinking and twitter takes the potential 1,080 "lines of vertical resolution" down to 756 effective lines. It's a good tradeoff: slightly less detail for a calmer picture.

720p is not filtered, since it is not interlaced. Notice that the 756 effective lines of 1080i are not that many more than the 720 actual lines of 720p. For more on this, see DVE Frequently Asked Questions, a discussion by TV guru Joe Kane of his then-upcoming Digital Video Essentials test DVD and D-VHS tape.

Joe Kane writes, in Interlaced Video Go Away, about how the resolution of 1080i video is a lot less than your might think:

Most films transferred to video come in at 800 to 1100 pixels [in each horizontal pixel row of the image] and video material will often be in the order of 1300 to 1400 lines [of horizontal resolution, not the nominal 1920 lines of 1080i]. The clear winner in picture quality is 720p over 1080i. The reason is interlaced artifacts and the vertical filtering required to get from progressive to interlaced. The real vertical resolution of 1080i images in motion is somewhere around 640 lines [due to the filtering]. The true horizontal resolution capability of the broadcast 1080i signal is 1440 pixels or less. The limitations are MPEG encoding and the bandwidth of a TV channel. There is little hope of that getting better any time soon. Even at 1440 x 1080i the MPEG artifacts and lack of vertical resolution in a moving image are far worse than at 720p.

Decoding that: in 1920x1080i video you get just 640 lines of vertical resolution(!), not 1,080, owing to image filtering that has to be done in order to head off the possibility of "interlace artifacts" on your TV screen. These artifacts, if allowed to show up on the screen, would lead to an unwelcome, visible structure of pixel rows/scan lines surrounding moving objects in the picture. They would also cause images to flicker and shimmer when, for example, there is a camera pan taking place.

Furthermore, MPEG encoding, done to compress the digital video signal by drastically reducing the number of bits per second in it, needs to have a lot fewer distinct pixels per pixel row than the nominal 1,920 pixels per row of 1080i: "1440 pixels or less." Otherwise, it is hard to get the desired compression ratios between the number of bits per second going into the encoder and the number of bps coming out. The only other way to get the desired compression would be to put up with irritating "macroblocking" artifacts. Most people prefer a slightly softer image.

Film-based material is notoriously harder to compress than video-based material, so for it, "1440 pixels or less" per pixel row has to be further reduced, to "800 to 1100 pixels." But 720p video, because it is progressive, not interlaced, does not have to be filtered in the way 1080i does. Accordingly, its 720 rows times 1,280 pixels per row arrives intact on the HDTV screen.


Also good, in a sense, is the "chroma subsampling" which reduces the number of bits in a 1080i or 720p signal.

Each pixel actually starts out as three pixels, one red, one green, one blue. These R, G, and B numerical values are combined algebraically according to a certain formula to derive Y, the luminance or luma signal. Y represents a black-and-white or monochrome picture. (Actually, all these values are "gamma-corrected" to stretch the contrast ratios at one end of the brightness range and compress them at the other, but I'll ignore that.)

Once Y is obtained for a pixel, the two values (B - Y), or Pb, and (R - Y), or Pr, are derived. Pb and Pr are the two chrominance or chroma signals. Together, Y, Pb, and Pr are the three separate signals of "component video."

Pb and Pr in effect "color in" the Y monochrome signal with blue and red, respectively. Algebraic manipulation of Y, Pb, and Pr can derive, in effect, the (G - Y) color difference signal which allows green to be "colored in" as well.

But whereas Y is transmitted at full resolution, Pb and Pr are downrezzed somewhat by means of 4:2:2 chroma subsampling. The "4:2:2" notation means essentially that each pair of horizontally adjacent Pb pixels — and, separately, each Pr pair — are blended into one double-width pixel, thus cutting the number of bits needed to represent Pb and Pr in half.

This also serves to halve the horizontal resolution of the two chroma signals. (The vertical resolution is left unchanged, as are the vertical and horizontal resolution of the luminance signal, Y.) Yet, thanks to the fact that the acuity of human vision is lower for color than for monochrome information, the reduction in color resolution is unnoticeable at normal viewing distances.


Saving bits is important. It is the whole rationale of digital video compression. According to the online article High-Definition Television Overview, each broadcast HDTV channel has to be shoehorned into an existing analog channel 6 MHz wide — the channel's "bandwidth." This can be done only if the digital data rate is limited to roughly 20 (actually, 19.2) megabits of information per second.

But HDTV can generate about 120 megabytes per second, uncompressed. (See Charles Poynton, Digital Video and HDTV Algorithms and Interfaces, p. 117.) That's 48 times what's allowed.

Chroma subsampling cuts two of the three YPbPr streams, Pb and Pr, in half, which by my calculation cuts the 120 MB/s down to 80 MB/s. That data rate is not small enough.

Eschewing progressive scan and using interlaced scanning, à la 1080i, cuts that in half: 40 MB/s, or 320 Mbits/s. Still not small enough. True digital compression is needed. Enter the MPEG suite of digital video compression techniques. The standard used for HDTV is MPEG-2; specifically, "MPEG-2 Main Profile at High Level." (DVDs are encoded at much lower data rates using "MPEG-2 Main Profile at Main Level.")

MPEG-2 compression, whatever its profile and level, first removes redundant information that the decoder can restore on its own. But that still isn't enough, so it uses an algorithm to strip out more information. This information cannot be restored by the decoder — the compression is technically "lossy" — but the algorithm is designed to remove only information whose loss is undetectable to the human eye.


That holds true as long as the MPEG compression ratio, which is adjustable, is not too high. But what constitutes "too high" depends on the scene. Busy scenes with fast motion cannot stand as much compression as static scenes with little fine detail.

DVD compressionists adjust the compression ratio scene by scene, but HDTV is broadcast in real time. Usually, there has to be a single compression ratio chosen to accommodate the busiest, most dynamic scenes. If too much compression is done, some scenes can lose visual detail, especially when full of motion.

Here's a case where how-many-pixels by how-many-pixels doesn't really tell you what the definition is. But keep in mind that overly enthusiastic digital compression produces eye-disturbing artifacts above and beyond reducing apparent resolution, so rarely do you hear too much lossy compression blamed for poor picture definition per se.

Too much lossy compression is more likely with 1080i, less likely with 720p. Although 720p refreshes each pixel twice for every one time a 1080i pixel gets updated, the spatial resolution within the 720p frame is so much lower that it's easier to shoehorn 720p in a 6 MHz channel. So 720p needs less compression than 1080i.


And now we come to the vexed question of horizontal resolution. In theory, 1080i can support 1,920 "lines" of it, 720p just 1,280 (since each "line" is really a column of pixels whose width is that of a single pixel).

But Joe Kane says in D-Theater - Questions and Answers that there are several caveats. Due to "many places in the production and distribution chain where image resolution can be lost" — i.e., due to signal-processing compromises — the cruel fact is "that the broadcast limitation of horizontal resolution for the 1080i system is about 1400 lines." Meanwhile, 720p's horizontal resolution remains as advertised: 1,280 lines.

Also, says Kane, "film content ... usually doesn’t get much above the 1300 line mark in horizontal resolution." Or, again, "Horizontal resolution of most film masters in 1080p is in the area of 800 to 1300 lines."

(1080p? That's like 1080i except that it's intended for uses such as film-to-video mastering which can benefit from higher data rates than broadcast HDTV allows. So its 1,920 x 1,080-pixel frames can use progressive rather than interlaced scanning, it needs no vertical filtering, and it can use other frame rates than 30 frames per second.)

The important thing to notice here is that two things can reduce the actual horizontal resolution below its theoretical maximum. One is signal processing, especially with 1080i; the other is limited resolution in the source material (for instance, a movie).

Both of these things can, of course, also reduce vertical resolution. In fact, the vertical filtering which eliminates 1080i interlace artifacts is a type of signal processing that limits vertical resolution.

More problematic is what happens when video starts out at standard definition. Say it begins life at 480i or 480p, the scan rates associated, respectively, with standard-def TV and with DVDs, when progressively scanned. The former is interlaced; the latter is, unsurprisingly, progressive. 480i/p SDTV can be scaled or upconverted to, say, 1080i for HDTV broadcast. But the amount of detail in the picture — both horizontally and vertically — stays the same.

So an HDTV channel that broadcasts upconverted SD material doesn't look much better (if any) than an SDTV channel broadcasting the same SD material. In fact, the "pseudo-HD" broadcast might have even less detail, if some of it was lost in the signal processing for the upconversion.

One wrinkle on pseudo-HD is what I just encountered on the ESPN-HD channel, on the Memorial Day broadcast of the National Lacrosse Championship (Johns Hopkins 9, Duke 8). Quite on purpose, owing to the fact that they were using SD cameras and transmission equipment, they took a standard-def picture with the squarish 4:3 aspect ratio and put it between two hi-def "pillarboxes" on the 16:9 screen. Though the actual picture was clean (because it was digital) it didn't have that crisp HD feel to it.

This same kind of pseudo-HD thing can reportedly happen inadvertently in a signal transmission chain that isn't set up right. Suppose a TV network sends a member station both a 1080i and a 480i version of a program. The station is supposed to send the former out over its digital channel, the latter over its traditional analog channel. But what if there's a screwup, and the 480i feed gets upconverted for the 1080i broadcast, while the 1080i feed is ignored? Result: a nominally 1080i broadcast with just 480i-like resolution.


There is a third thing which can harm horizontal resolution. It is actually itself a kind of signal processing: intentional downresolution or downconversion. "Downrezzing," it's familiarly called. For example, the May/June issue of The Perfect Vision magazine cites the TiVo Community Forum web site to the effect that DirecTV is reducing the resolution of its 1080i channels from 1,920 x 1,080 to 1,280 x 1,080. (See "Has DirecTV Downrez'd HDTV?," pp. 14-15).

The reporter could not get DirecTV to confirm this policy. If true, it is doubtless being done in order not to overload the data transmission capacity of its satellite transponders, while still offering the same number of HD channels.


So there are several things which make the effective, as opposed to theoretical, definition in a digital HDTV picture hard to pin down. As we have seen, filtering, chroma subsampling, MPEG compression, limited resolution in source material, losses in digital signal processing, and downrezzing are among the most important.

We can compare either of the HDTV formats — 1080i or 720p — to a straw, and the picture content to a milkshake. The more detail exists in the content, the "thicker" the shake. The thicker the shake, the "fatter" the straw ought to be — i.e., the higher the definition of the format should be.

All the same, just because you have a super-fat straw doesn't guarantee that your milkshake isn't thin and soupy. Just because you are receiving a 1080i or 720p signal doesn't mean the content isn't essentially 480i.

No comments: