Part One
Technology Trends
1
Convergence
Convergence means different things to different people depending on the context. However, for the purpose of this book, we define four kinds of convergence: (i) industry convergence, (ii) device convergence, (iii) network convergence and (iv) service convergence.
1.1 Industry Convergence
The telephone (telecommunication) industry, the television (media/broadcast) industry and the Internet industry once existed separately with specialized infrastructure to deliver their respective services. For example, the telecom (voice) industry was built on specialized circuitswitched network technology, a multibillion dollar telecom switch and equipment industry, to deliver “telephony” or “voice” services to consumers [1,3]. Consumers wrote a check to a telephone company, such as AT&T, for the monthly telephony service they received from it. The television (media/broadcast) industry used specialized broadcast network technology to deliver television (video broadcast) services to consumers. As with the telephony service, consumers made a monthly payment to a cable/satellite/television service provider, such as Comcast or DirectTV, for television services. Broadband access to the Internet was also offered as an independent service by broadband service providers, such as AOL, and consumers paid them for Internet (data) services.
However, with technological advances, these apparently independent industries are converging and are contending in the same digital content distribution space (Figure 1.1) [2]. There are two major reasons behind this transition. First, analog content is being replaced with digital content. As a result, content, no matter what industry it belongs to, is converted from analog to digital and then packaged into small units called packets. Second, the network infrastructure is converging into a common packet-switched Internet protocol (IP)-based network technology, which is capable of carrying packets in an efficient manner. Naturally, all content, voice, video and data, are being transported over the common network. Telephony has become an “application” on the Internet (voice over IP). Television is also becoming an “application” on the Internet (Internet television) and the Internet itself, which used to be nonfriendly to real-time traffic, is also morphing to support real-time traffic, preserving quality of service. As a result, the challenges being faced by these industries are almost identical, except for whatever business challenges they have specific to their domain. Moreover, each of these industries is expanding the boundaries of its business, thereby treading in so-called unfamiliar territory. This is leading to challenges but also to opportunities that we discuss in detail in later parts of the book.
1.2 Device Convergence
Consumer electronics and communications functionality are converging onto consumer devices. For example, laptops are being equipped with microphone, speakers, cameras and other consumer electronics to enable new capabilities like telephony and video conferencing (using Skype [5], Yahoo! Messenger [6], GTalk [7] etc.) across the Internet in addition to the traditional applications, such as Web surfing, instant messaging and e-mail. What used to be just a mobile phone a few years ago is today a camera, a video recorder, an MP3 player, an AM/FM radio, an electronic organizer, a gaming controller, a phone, a device for surfing the Web, a device for sending instant messages, and in some cases, a device for watching television (Figure 1.2). Consumers’ ownership of such powerful handheld devices opens the door for communications service providers (CSPs) to deliver a variety of content embodied in text, images, audio and video to the end user. However, the fact that an end user can store the delivered multimedia content and share it with the rest of the world with a single push of a button may lead to unprecedented illegal sharing of content, making it content owners’ worst nightmare. Thus the benefits of convergence come with challenges of security and privacy.
1.3 Network Convergence
Network infrastructures used by the telephone (telecommunication) industry, the television (media/broadcast) industry and the Internet industry have traditionally been very different. The telecommunications industry has been using circuit-switched network elements; the television industry has been using broadcast network equipment and the Internet industry has been using packet-switched network elements. Packet-switched networks have been built using different technologies as well. For example, asynchronous transfer mode (ATM), frame relay (FR) and IP are all technologies that have been used and are still being used in CSPs’ networks. One way of reducing capital expenses (capex) and operational expenses (opex) would be to choose a common technology for the network infrastructure. This would assist CSPs in their need to contain expenses by training and employing technical people skilled in only the chosen type of technology. The fact of the matter is, the CSPs are converging onto using only IP/MPLSbased networks for transport and IP multimedia subsystem (IMS)-based infrastructure for session/service and blended (voice, video, data) applications (Figure 1.3). This transition in the industry to converge on to a common network for applications and services is referred to as network convergence and this has far-reaching consequences for the industry.
1.4 Service Convergence
Services offered by the telecommunications industry, the television industry, the Internet industry and the wireless services industry have been independent of one another. However, with the introduction of new technology enabling unified communications across these networks, consumers expect to access the same services (voice, e-mail, messaging and so forth) and content (Web, video, audio) anytime from anywhere using any device (laptop, TV set, cellphone) with consistent quality of experience (Figure 1.4) [4]. An example of service convergence would be for CSPs to offer a service that would enable their customers to take part in social networking using any device from anywhere. Moreover, customers not only expect to be able to use the services from anywhere using any device, but they also expect to move content/services seamlessly from network to network without compromising quality of experience.
For example, as shown in Figures 1.5 and 1.6, a phone call uses the cellular network when that is the only network available for connectivity and uses the WiFi network when that is available in addition to the cellular network. In fact, the transition from cellular network to WiFi network happens seamlessly without interrupting the phone call.
Figures 1.7 and 1.8 show how video being watched on a small-screen cellphone in a train is seamlessly transitioned to a large-screen TV set when the user enters home. This is an example of seamless mobility of content. While service convergence opens up unprecedented opportunities of offering novel value-added blended services for the CSPs, it also makes the content providers worried that what used to be protected content in their network may not be protected any more due to lack of a comprehensive security solution spanning multiple networks.
1.5 Summary
Digital convergence is already happening in the industry. With digitization of content, the distinction between voice, video, images and text is blurring as everything is being treated uniformly as data and transported over a common IP network as opposed to using specialized networks for transporting voice, video and data. Furthermore, everything is becoming an application on the IP network leading to overlapping of what used to be distinct industry segments, namely, telecom (voice), broadcast and media (video) and Internet (data). In order to provide access to these applications from anywhere and at any time, devices (PC/laptop, mobile handsets, TV sets) are becoming more and more powerful with multiple consumer electronic features being built into them, leading to what is known in the industry as device convergence. A case in point is a smart handset with features, such as, AM/FM radio, mobile TV, phone, browser, digital camera, video recorder, MP3 player, calendar, office applications and a host of other features. A variety of network technologies are converging into IP-based technology, leading to mixing and matching of applications and features in any service from the end-user perspective and lower capital expense (capex) and operational expense (opex) from the service provider perspective. Service convergence refers to the capability of end users to avail the same service regardless of the network over which it is accessed and the ability of end users to access the same content over multiple devices in a seamless manner. Digital convergence is leading to new applications and services that were not possible before and opening up new possibilities both from the service provider perspective as well as from the end-user standpoint.
References
[1] Hudson H.E. (1997) Converging technologies and changing realities: towards universal access to telecommunication in developing world, in Telecom Reform: Principles, Politics and Regulation (ed. W.H. Melody), Technical University of Denmark, Lyngby.
[2] Lamberton, D.M. (1995) Technology, information and institution, in Beyond Competition: The Future of Telecommunications (ed. D.M. Lamberton), Elsevier, Amsterdam.
[3] Mitchell, J. (1997) Convergent communication, fragmented regulation and consumer needs, in Telecom Reform: Principles, Politics and Regulation (ed. W.H. Melody), Technical University of Denmark, Lyngby.
[4] Service convergence: bringing it all together; Telecom Asia, April, 2005.
[5] Skype. http://www.skype.com/ (accessed June 9, 2010).
[6] Yahoo Messenger. http://in.messenger.yahoo.com/ (accessed June 9, 2010).
[7] Google Talk. http://www.google.com/talk./ (accessed June 9, 2010).
2
Video Compression, Encoding and Transport
Video is nothing but a sequence of still images. One way of compressing and encoding video is to compress each individual still image and encode it independently of the other images in the sequence. The Joint Photographic Experts Group (JPEG) format is one way of compressing still images. When individual still images in a sequence are independently compressed and encoded using JPEG, the video-encoding format is called Motion JPEG (or MJPEG). However, as will subsequently be discussed, there are better ways of compressing and encoding video, which result in far fewer bits than MJPEG for representing the same video in digital form.
In any case, techniques used for compressing still images form the foundation of video compression [1–3]. So we will start by understanding how still images are compressed.
2.1 Still Image Compression
2.1.1 Block Transform
Each image is usually divided into many blocks, each of size 8 pixel × 8 pixel. These 64 pixels are then transformed into frequency domain representation by using what is called discrete cosine transform (DCT). The frequency domain transformation clearly separates out the lowfrequency components from high-frequency components. Conceptually, the low-frequency components capture visually important components whereas the high-frequency components capture visually less striking components. The goal is to represent the low-frequency (or, visually more important) coefficients with higher precision or with more bits and the highfrequency (or visually less important) coefficients with lower resolution or with fewer bits. Since the high-frequency coefficients are encoded with fewer bits, some information is lost during compression, and hence this is referred to as “lossy” compression. When inverse DCT (IDCT) is performed on the coefficients to reconstruct the image, it is not exactly the same as the original image but the difference between them is not perceptible to the human eye.
2.1.2 Quantization
As mentioned in the earlier section, the DCT coefficients of each 8 × 8 pixel block are encoded with more bits for highly perceptible low-frequency components and fewer bits for less perceptible high-frequency components. This is achieved in two steps. First step is quantization, which eliminates perceptibly less significant information, and the second step is encoding which minimizes the number of bits needed for representing the quantized DCT coefficients.
Quantization is a technique by which real numbers are mapped into integers within a range where the integer represents a level or quantum. The mapping is done by rounding a real number up to the nearest high integer, so some information is lost in the process. At the end of quantization, each 8 × 8 block will be represented by a set of integers, many of which are zeroes because the high-frequency coefficients are usually small and they end up being mapped to 0.
2.1.3 Encoding
The goal of encoding is to represent the coefficients using as few bits as possible. This is accomplished in two steps. Run length coding (RLC) is used to do the first level of compression. Variable length coding (VLC) does the next level of compression.
In fact, after quantization, the majority of high-frequency DCT coefficients become zeroes. Run length coding takes advantage of this by coding the high-frequency DCT coefficients before coding the low-frequency DCT coefficients so that consecutive number of zeroes is maximized. This is accomplished by scanning the 8 × 8 matrix in a diagonal zigzag manner. Run length coding encodes consecutive identical coefficients by using two numbers. The first number is the “run” (the number that occurs consecutively) and the second number is “length” (the number of consecutive occurrences). Thus if there are N consecutive zeroes, instead of coding each zero separately, RLC will represent the string of N zeroes as [0, N].
After RLC, there will be a sequence of numbers and VLC encodes these numbers using minimum number of bits. The technique is to use the minimum number of bits for the most commonly occurring number and use more bits for less common numbers. Since a variable number of bits are used for coding, it is referred to as variable length coding.
2.1.4 Compressing Even Further
The techniques described so far focused on the optimal way of compressing an 8 × 8 pixel block. However, there is significant correlation between neighboring blocks in a frame. Thus instead of coding each block indepen...