Lesson Three Image & Video
Text A: Video
The Human eye has the property that when an image is flashed on the retina, it is retained for some number of milliseconds, before decaying. If a sequence of images is flashed at 50 or more images/sec, the eye does not notice that it is looking at discrete images. All video (i.e, television) systems exploit this principle to produce moving pictures.
To understand video systems, it is best to start with simple, old-fashioned black-and- white television. To represent the two-dimensional image in front of it as a one-dimensional voltage as a function of time, the camera scans an electron beam rapidly across the image and slowly down it, recording the light intensity as it goes. At the end of the scan, called a frame, the beam retraces. This intensity as a function of time is broadcast, and receivers repeat the scanning process to reconstruct the image. The scanning pattern used by both the camera and the receiver is shown in Fig.1. (As an aside, CCD cameras integrate rather than scan, but some cameras and all monitors do scan.)
Fig.1 The scanning pattern used for NTSC video and television
The exact scanning parameters vary from country to country. The system used in North and South America and Japan has 525 scan lines, a horizontal to vertical aspect ratio of 4:3, and 30 frames/sec. The European system has 625 scan lines, the same aspect ratio of 4:3, and 25 frames/sec. In both systems, the top few and bottom few lines are not displayed (to approximate a rectangular image on the original round CRTs). Only 483 of the 525 NTSC scan lines (and 576 of the 625 PAL/SECAM scan lines) are displayed. The beam is turned off during the vertical retrace, so many stations (especially in Europe) use this interval to broadcast Tele Text (text pages containing news, weather, sports, stock prices, etc.).
While 25 frames/sec is enough to capture smooth motion, at that frame rate many people, especially older ones, will perceive the image to flicker (because the old image has faded off the retina before the new one appears). Rather than increase the frame rate, which would require using more scarce bandwidth, a different approach is taken. Instead of displaying the scan lines in order, first all the odd scan lines are displayed, then the even ones are displayed. Each of these half frames is called a field. Experiments have shown that although people notice flicker at 25 frames/sec, they do not notice it at 50 fields/sec. This technique is called interlacing. Noninterlaced television or video is said to be progressive.
Color video uses the same scanning pattern as monochrome (black and white), except that instead of displaying the image with one moving beam, three beams moving in unison are used. One beam is used for each of the three additive primary colors: red, green, and blue (RGB). This technique works because any color can be constructed from a linear superposition of red, green, and blue with the appropriate intensities. However, for transmission on a single channel, the three color signals must be combined into a single composite signal.
When color television was invented, various methods for displaying color were technically possible, and different countries made different choices, leading to systems that are still incompatible. (Note that these choices have nothing to do with VHS versus Betamax versus P2000, which are recording methods.) In all countries, a political requirement was that programs transmitted in color had to be receivable on existing black-and-white television sets. Consequently, the simplest scheme, just encoding the RGB signals separately, was not acceptable. RGB is also not the most efficient scheme.
The first color system was standardized in the United States by the National Television Standards Committee, which lent its acronym to the standard: NTSC. Color television was introduced in Europe several years later, by which time the technology had improved substantially, leading to systems with greater noise immunity and better colors. These are called SECAM (SEquentiel Couleur Avec Memoire), which is used in France and Eastern Europe, and PAL (Phase Alternating Line) is used in the rest of Europe. The difference in color quality between the NTSC and PAL/SECAM has led to an industry joke that NTSC really stands for Never Twice the Same Color.
To allow color transmissions to be viewed on black-and-white receivers, all three systems linearly combine the RGB signals into a luminance (brightness) signal, and two chrominance (color) signals, although they all use different coefficients for constructing these signals from the RGB signals. Interestingly enough, the eye is much more sensitive to the luminance signal than to the chrominance signals, so the latter need not be transmitted as accurately. Consequently, the luminance signal can be broadcast at the same frequency as the old black-and-white signal, so it can be received on black-and-white television sets. The two chrominance signals are broadcast in narrow bands at higher frequencies. Some television sets have controls labeled brightness, hue, and saturation (or brightness, tint and color) for controlling these three signals separately. Understanding luminance and chrominance is necessary for understanding how video compression works.
In the past few years, there has been considerable interest in HDTV (High Definition Television), which produces sharper images by roughly doubling the number of scan lines. The United States, Europe, and Japan have all developed HDTV systems, all different and all mutually incompatible. The basic principles of HDTV in terms of scanning, luminance, chrominance, and so on, are similar to the existing systems. However, all three formats have a common aspect ratio of 16:9 instead of 4:3 to match them better to the format used for movies (which are recorded on 35 mm film).
New Words and Expressions
Notes
1. The Human eye has the property that when an image is flashed on the retina, it is retained for some number of milliseconds, before decaying.
When引导一个时间状语从句。此句译为:从人眼的特性来说,眼前闪过的图像在消失前会在视网膜上停留几毫秒。
2.Frame为电视的一幅画面,从技术上应称为“帧”。
3.This technique works because any color can be constructed from a linear superposition of red, green, and blue with the appropriate intensities.
译文:之所以采用这种技术,是因为任何一种颜色都可以表示为红、绿、蓝三种颜色浓度的线性组合。
4. SECAM制即塞康制。它是法文Sequentiel Couleur A Memoire的缩写,意思为“按顺序传送彩色与存储”。在信号传输过程中,亮度信号每行都传送,而两个色差信号则是逐行依次传送,即用行错开传输时间的办法来避免同时传输时所产生的串色以及由其造成的彩色失真。
5. PAL制又称为帕尔制。它克服了NTSC制对相位失真的敏感问题。PAL是英文Phase Alteration Line的缩写,意思是逐行倒相,也属于同时制。它对同时传送的两个色差信号中的一个色差信号采用逐行倒相,另一个色差信号进行正交调制。从而有效地克服了因相位失真而起的色彩变化。
6. HDTV是High Definition Television的简称,翻译成中文是“高清晰度电视”的意思,HDTV与当前采用模拟信号传输的传统电视系统不同,它采用了数字信号传输。分辨率最高可达1920×1080,帧率高达60fps。
Exercises
Ⅰ. Comprehension Questions
1. According to the text, please explain the principle that all video system exploit to produce moving pictures.
2. Please explain the technique of interlacing.
3. How can the color transmissions be viewed on black-and-white receivers?
4. According to the text, what is necessary for understanding how video compression works?
5. What is the basic principle of HDTV?
Ⅱ. Translating the following paragraph into Chinese
To allow color transmissions to be viewed on black-and-white receivers, all three systems linearly combine the RGB signals into a luminance (brightness) signal, and two chrominance (color) signals, although they all use different coefficients for constructing these signals from the RGB signals. Interestingly enough, the eye is much more sensitive to the luminance signal than to the chrominance signals, so the latter need not be transmitted as accurately. Consequently, the luminance signal can be broadcast at the same frequency as the old black-and-white signal, so it can be received on black-and-white television sets. The two chrominance signals are broadcast in narrow bands at higher frequencies. Some television sets have controls labeled brightness, hue, and saturation (or brightness, tint and color) for controlling these three signals separately. Understanding luminance and chrominance is necessary for understanding how video compression works.
Text B: Video on Demand
Video on demand is sometimes compared to an electronic video rental store. The user (customer) selects any one of a large number of available videos and takes it home to view. Only with video on demand, the selection is made at home using the television set's remote control, and the video starts immediately. No trip to the store is needed. Needless to say, implementing video on demand is a wee bit more complicated than describing it. In this section, we will give an over-view of the basic ideas and their implementation. A description of one real implementation can be found in (Nelson and Linton. 1995). A more general treatment of interactive television is in (Hodge, 1995). Other relevant references are (Chang et al., 1994; Hodge et al.. 1993; and Little and Venkatesh. 1994).
Is video on demand really like renting a video, or is it more like picking a movie to watch from a 500- or 5000-channel cable system? The answer has important technical implications. In particular, video rental users are used to the idea of being able to stop a video, make a quick trip to the kitchen or bathroom, and then resume from where the video stopped. Television viewers do not expect to put programs on pause.
If video on demand is going to compete successfully with video rental stores, it may be necessary to allow users to stop, start, and rewind videos at will. Giving users this ability virtually forces the video provider to transmit a separate copy to each one.
On the other hand, if video on demand is seen more as advanced television, then it may be sufficient to have the video provider start each popular video, say, every 10 minutes, and run these nonstop. A user wanting to see a popular video may have to wait up to 10 minutes for it to start. Although pause/resume is not possible here, a viewer returning to the living room after a short break can switch to another channel showing the same video but 10 minutes behind. Some material will be repeated, but nothing will be missed. This scheme is called near video on demand. It offers the potential for much lower cost, because the same feed from the video server can go to many users at once. The difference between video on demand and near video on demand is similar to the difference between driving your own car and taking the bus.
Watching movies on (near) demand is but one of a vast array of potential new services possible once wideband networking is available. Here we see a high-bandwidth, (national or international) wide area backbone network at the center of the system. Connected to it are thousands of local distribution networks, such as cable TV or telephone company distribution systems. The local distribution systems reach into people's houses, where they terminate in set-top boxes, which are, in fact, powerful, specialized personal computers.
Attached to the backbone by high-bandwidth optical fibers are thousands of information providers. Some of these will offer pay-per-view video or pay-per-hear audio CDs. Others will offer specialized services, such as home shopping (with the ability to rotate a can of soup and zoom in on the list of ingredients or view a video clip on how to drive a gasoline-powered lawn mower). Sports, news, returns of "I Love Lucy, " WWW access, and innumerable other possibilities will no doubt quickly become available.
Also included in the system are local spooling servers that allow videos to be prepositioned closer to the users, to save bandwidth during peak hours. How these pieces will fit together and who will own what are matters of vigorous debate within the industry. Below we will examine the design, one of the main pieces of the system: the video servers.
Video Servers
To have (near)video on demand, we need video servers capable of storing and outputting a large number of movies simultaneously. The total number of movies ever made is estimated at 65,000 (Minoli, 1995). When compressed in MPEG-2, a normal movie occupies roughly 4 GB of storage, so 65,000 of them would require something like 260 terabytes. Add to this all the old television programs ever made, sports films, newsreels, talking shopping catalogs, etc., and it is clear that we have an industrial-strength storage problem on our hands.
The cheapest way to store large volumes of information is on magnetic tape. This has always been the case and probably always will be. A DAT tape can store 8 GB (two movies) at a cost of about 5 dollars/gigabyte. Large mechanical tape servers that hold thousands of tapes and have a robot arm for fetching any tape and inserting it into a tape drive are commercially available now. The problem with these systems is the access time (especially for the second movie on a tape), the transfer rate, and the limited number of tape drives (to serve n movies at once, the unit would need n drives).
Fortunately, experience with video rental stores, public libraries, and other such organizations shows that not all items are equally popular. Experimentally, when there are N movies available, the fraction of all requests being for the kth most popular one is approximately C/k (Chervenak, 1994). Here C is computed to normalize the sum to 1, namely
C=1/(1+1/2+1/3+1/4+1/5+…+1/N)
Thus the most popular movie is seven times as popular as the number seven movie. This result is known as Zipf's law (Zipf, 1949).
The fact that some movies are much more popular than others suggests a possible solution in the form of a storage hierarchy, as shown in Fig. 1. Here, the performance increases as one moves up the hierarchy.
Fig. 1 A video server storage hierarchy
Now let us take a brief look at video server software. The CPUs are used for accepting user requests, locating movies, moving data between devices, customer billing, and many other functions. Some of these are not time critical, but many others are. So some, if not all, of the CPUs will have to run a real-time operating system, such as a real-time microkernel. These systems normally break work up into smaller tasks, each with a known deadline. The scheduler can then run an algorithm such as nearest deadline next of the rate monotonic algorithm.
New Words and Expressions
Notes
1. video on demand:视频点播,也称为交互式电视点播系统。VOD这种新兴的传媒方式集合了视频压缩、多媒体传输、计算机与网络通讯等多种技术,它是多领域技术交叉融合的产物,为用户提供高质量的视频节目、信息服务。用户可按照自己的需要在电脑或电视上自由的点播、下载网络中的视频节目和信息。
2. near video on demand:准视频点播。它是单向数字电视系统增值业务之一,是利用视频服务器将一个数字电视节目在几个数字通道中延时播放。使用户在点播该节目时可以等待一段时间后完整地观看该节目。
3. Attached to the backbone by high-bandwidth optical fibers are thousands of information providers.
译文:数以千计的信息提供者通过高速宽带光纤连接到主干系统上。
4. MPEG-2(Moving Picture Experts Group 2):运动图像专家组。MPEG-2标准是针对标准数字电视和高清晰度电视在各种应用下的压缩方案和系统层的详细规定,编码码率从每秒3兆比特~100兆比特,标准的正式规范在ISO/IEC13818中。MPEG-2特别适用于广播级的数字电视的编码和传送,被认定为SDTV和HDTV的编码标准。
5. video server:视频服务器。
Exercises
Ⅰ.Comprehension Questions
1. Explain the concept of Video on Demand.
2. According to the text, explain the disadvantage of storing information on magnetic tape.
3. Describe the services offered by the information providers attached to the backbone by high-bandwidth optical fibers.
4. According to the text, what is the cheapest way to store large volumes of information.
5. Explain the advantage of Video on Demand.
6. Predict the application prospect of Video on Demand in the near future.
Ⅱ.Translating the following into Chinese
Is video on demand really like renting a video, or is it more like picking a movie to watch from a 500- or 5000-channel cable system? The answer has important technical implications. In particular, video rental users are used to the idea of being able to stop a video, make a quick trip to the kitchen or bathroom, and then resume from where the video stopped. Television viewers do not expect to put programs on pause.