Computer Science
Data Compression
Data compression is the process of reducing the size of data to save storage space or transmission time. It is achieved by encoding information using fewer bits than the original representation. This can be done through various algorithms and techniques, such as lossless compression which preserves all the original data, or lossy compression which sacrifices some data to achieve higher compression ratios.
Written by Perlego with AI-assistance
Related key terms
1 of 5
11 Key excerpts on "Data Compression"
- eBook - ePub
- Ida Mengyi Pu(Author)
- 2005(Publication Date)
- Butterworth-Heinemann(Publisher)
Chapter 1Introduction
Data Compression is, in the context of computer science, the science (and art) of representing information in a compact form. It has been one of the critical enabling technologies for the ongoing digital multimedia revolution for decades.Most people frequently use Data Compression software such as zip , gzip and WinZip (and many others) to reduce the file size before storing or transferring it in media. Compression techniques are embedded in more and more software and data are often compressed without people knowing it.Data Compression has become a common requirement for most application software as well as an important and active research area in computer science. Without compression techniques, none of the ever-growing Internet, digital TV, mobile communication or increasing video communication techniques would have been practical developments.Typical examples of application areas that are relevant to and motivated by Data Compression include• personal communication systems such as facsimile, voice mail and telephony • computer systems such as memory structures, disks and tapes • mobile computing • distributed computer systems • computer networks, especially the Internet • multimedia evolution, imaging, signal processing • image archival and videoconferencing • digital and satellite TV.Practical problems have motivated various researches in Data Compression. Equally, research in Data Compression has also been based on or stimulated other new subject areas. Partly due to its broad application territory, Data Compression overlaps with many science branches and can be found in many different subject areas. For example, you will see chapters or sections dedicated to Data Compression in books on• information theory • coding theory • computer networks and telecommunications • digital signal processing • image processing • multimedia • steganography • computer security.The language used in unrelated disciplines can be substantially different. In this book, the word data is in general used to mean the information in digital form on which computer programs operate, and compression - Willi-Hans Steeb(Author)
- 2005(Publication Date)
- WSPC(Publisher)
Chapter 13 Data Compression 13.1 Introduction Under Data Compression ([6], [42], [51], [52]) we understand increasing the amount of data that can be stored in a given domain, such as space, time, or frequency, or contained in a given message length. We also understand under Data Compression reducing the amount of storage space required to store a given amount of data, or reducing the length of a message required to transfer a given amount of information. Data Compression may be accom- plished by simply squeezing a given amount of data into a smaller space, for example, by increasing packing density or by transferring data on punched cards onto a magnetic tape or CD ROM. Data Compression does not reduce the amount of data used to represent a given amount of information, where data compaction does. Both Data Compression and data compaction result in the use of fewer data elements for a given amount of information. Data Compression is particularly useful in communications because it en- ables devices to transmit the same amount of data in fewer bits. There are a variety of Data Compression techniques, but only a few have been standardised. For example, the CCITT has defined a standard data com- pression technique for transmitting faxes and a compression standard for data communications through modems. In addition, there are file compres- sion formats, such as ARC and ZIP. Data Compression is also widely used in backup utilities, spreadsheet applications, and a database management systems. Certain types of data, such as bit-mapped graphics, can be com- pressed to a small fraction of their normal size. Data Compression operates in general by taking “symbols” from an input “text”, processing them, and writing “codes” to a compressed file. Symbols are usually bytes, but they could also be pixels, 80 bit floating point num- 206- eBook - PDF
Mathematics That Power Our World, The: How Is It Made?
How Is It Made?
- Joseph Khoury, Gilles Lamothe(Authors)
- 2016(Publication Date)
- World Scientific(Publisher)
Similarly, uncompressed texts, images, au-dio and video files or information transfer over digital networks require 39 40 The mathematics that power our world substantial storage capacity certainly not available on standard machines you use at the office or at home. By a Data Compression algorithm , we usually mean a process through which we can represent data in a compact and digital form that uses less space to store or less time to transmit over a network than the original form. This is usually done by reducing unwanted noise (or redundancy) in the data to a certain degree where it is still possible to recover it in an acceptable form. The process of assigning digital codes to pieces of data for storage or transmission is called encoding . Of course, a compression algorithm is only efficient when we are able to reverse the process, that is to retrieve the original sequence of characters from the encoded digital form. This process is called decoding or decompressing . In the literature, the word compression often means both the compression and the decompression processes (or encoding and decoding) of data. 2.1.2 Before you go further Mathematical skills like manipulating algebraic inequalities, basic properties of logarithmic functions and some level of discrete mathematics are needed in this chapter. 2.2 Storage inside computers When you click the “Save” button after working on or viewing a document (text, image, audio, . . . ), a convenient interpretation of what happens next is to imagine that your computer stores the file in the form of a (long) finite sequence of 0’s and 1’s that we call a binary string. The reason why only two characters are used was explained briefly in the chapter on electronic calculators. 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 In this digital format, each of the two characters “0” and “1” is called a “ bit ” (binary digit). The number of bits in a string is called the length of the string. - eBook - PDF
- Gerald Friedland, Ramesh Jain(Authors)
- 2014(Publication Date)
- Cambridge University Press(Publisher)
124 Fundamentals of Compression 11 A major difference between multimedia data and most other data is its size. Images and audio files take much more space than text, for example. Video data is currently the single largest network bandwidth and hard disk space consumer. Compression was, therefore, among the first issues researchers in the emerging multimedia field sought to address. In fact, multimedia’s history is closely connected to different compression algorithms because they served as enabling technologies for many applications. Even today, multimedia sig-nal processing would not be possible without compression methods. A Blu-ray disc can currently store 50 Gbytes, but a ninety-minute movie in 1,080p HDTV format takes about 800 Gbytes (without audio). So how does it fit on the disc? The answer to many such prob-lems is compression. This chapter discusses the underlying mathematical principles of compression algo-rithms, from the basics to advanced techniques. However, all the techniques outlined in this chapter belong to the family of lossless compression techniques; that is, the original data can be reconstructed bit by bit. Lossless compression techniques are applicable to all kinds of data, including non-multimedia data. However, these techniques are not always effective with all types of data. Therefore, subsequent chapters will introduce lossy com-pression techniques that are usually tailored to a specific type of data, for example, image or sound files. RUN-LENGTH CODING Before discussing what compression is and how you can develop algorithms to represent different types of content with the least amount of space possible, let’s start with a simple and intuitive example. In addition to introducing the concept of compression, this exam-ple demonstrates a practical method that computer scientists often use to compress data with large areas of the same values. - eBook - PDF
- Kenny A. Hunt(Author)
- 2016(Publication Date)
- A K Peters/CRC Press(Publisher)
Image Compression 10 10.1 Overview The primary goal of image compression is to minimize the memory footprint of image data so that storage and transmission times are minimized. Producing compact image data is essential in many image processing systems since stor-age capacity can be limited, as is the case with digital cameras, or costly, as is the case when creating large warehouses of image data. Transmission of image data is also a central concern in many image processing systems. Recent stud-ies of web use, for example, have estimated that images and video account for approximately 85% of all Internet traffic. Reducing the memory footprint of im-age data will correspondingly reduce Internet bandwidth consumption. More importantly, however, since most web documents contain image data it is vital that the image data be transferred over the network within a reasonable time frame. Reducing the memory footprint has the significant advantage of speeding delivery of web content to the end user. Image compression works by identifying and eliminating redundant, or du-plicated, data from a source image. There are three main sources of redundancy in image compression. The first is known as interpixel redundancy, which recog-nizes that pixels that are in close proximity within an image are generally related to each other. A compression technique that reduces memory by recognizing some relationship between pixels based on their proximity is an attempt to elim-inate interpixel redundancy. Run length encoding, constant area coding, and JPEG encoding seek to eliminate this source of unnecessary data. The second source of redundancy is known as psycho-visual redundancy. Since the human visual system does not perceive all visible information with equal sensitivity we understand that some visual information is less important than others. Image compression systems will simply eliminate information that is deemed to be unimportant in terms of human perception. - Meiqing Wang, Choi-Hong Lai(Authors)
- 2016(Publication Date)
- Chapman and Hall/CRC(Publisher)
199 6 C H A P T E R Image Compression S torage and transmission are essential processes in image processing. As discussed in Chapter 1, images are generally stored in the bitmap for-mat, and the memory in spatial dimensions could be very large if images are stored directly without preprocessing. For example, the data of an 8-bit grey-scale image with the resolution 256 × 256 requires a total memory of 65536 bytes (or 64 kilobytes). The memory required for a true colour image increases to 64 kilobytes × 3 = 192 kilobytes. Under the National Television Standard Committee (NTSC) standard, 30 frames of images are played in one second to ensure continuous vision effect. Suppose the images are true colour having a resolution of 720 × 576, the images played in one second would require the storage size of 720 × 576 × 3 × 30 = 37324800 bytes = 36 megabytes. Such a huge amount of data would cause enormous diffi-culties during storage or transmission. Therefore, compression of original images is inevitable to facilitate transmission or other processes. The essence of compression is to use a compressed file with smaller storage size requirements to replace the original one. The compressed file can be reverted to the original file through decompression. If the decom-pressed image is identical to the original image, the corresponding com-pression method is called lossless compression ; otherwise, it is called lossy compression . Common lossy compression methods include predictive compression, vector quantisation, transform encoding, wavelet compres-sion, and fractal compression. The last two methods are considered as state-of-the-art transform compression techniques. Compression rate can be used to assess the efficiency of a compres-sion method. It is defined as the ratio of the size of the original file to the compressed file. If the size of the original file and the compressed file are- eBook - PDF
Algorithms and Theory of Computation Handbook, Volume 1
General Concepts and Techniques
- Mikhail J. Atallah, Marina Blanton(Authors)
- 2009(Publication Date)
- Chapman and Hall/CRC(Publisher)
14 Text Data Compression Algorithms Maxime Crochemore King’s College London and Université Paris-Est Thierry Lecroq University of Rouen 14.1 Text Compression .............................................. 14 -1 14.2 Static Huffman Coding ........................................ 14 -2 Encoding • Decoding 14.3 Dynamic Huffman Coding ................................... 14 -6 Encoding • Decoding • Updating 14.4 Arithmetic Coding ............................................. 14 -11 Encoding • Decoding • Implementation 14.5 LZW Coding ..................................................... 14 -18 Encoding • Decoding • Implementation 14.6 Mixing Several Methods ...................................... 14 -21 Run Length Encoding • Move to Front • Integrated Example 14.7 Experimental Results ........................................... 14 -22 14.8 Research Issues and Summary ............................... 14 -23 14.9 Further Information ........................................... 14 -24 Defining Terms ......................................................... 14 -24 References ................................................................ 14 -24 14.1 Text Compression This chapter describes a few algorithms that compress texts. Compression serves both to save storage space and transmission time. We shall assume that the text is stored in a file. The aim of compression algorithms is to produce a new file, as short as possible, containing the compressed version of the same text. Methods presented here reduce the representation of text without any loss of information, so that decoding the compressed text restores exactly the original data. The term “text” should be understood in a wide sense. It is clear that texts can be written in natural languages or can be texts usually generated by translators (like various types of compilers). But texts can also be images or other kinds of structures as well provided the data are stored in linear files. - eBook - PDF
- Stan Birchfield(Author)
- 2017(Publication Date)
- Cengage Learning EMEA(Publisher)
On a typical image-sharing website, hundreds of millions of photographs are uploaded every day, amounting to several exabytes per year of images. † These numbers are staggering, and although we are starting to reach the point where memory is cheap enough that we can begin to think about storing large collections of raw images and videos at home or on a server, limited transmission speeds and the desire to store these data on mobile devices, not to mention rapidly increasing rates of content creation, continue to motivate the need for compressing and decompressing the data. An overview of a compression/decompression system is provided in Figure 8.1. A stream of bits (in our case an image) is fed to a compressor , which converts the stream to a smaller stream of bits. This new stream is then either stored as a file on disk or transmitted across a network, where on the other end a decompressor restores the original image. Sometimes the compressor and decompressor are known as a coder and decoder , respectively, so that the software part of the system is collectively known as a codec . When we say that the decompressor restores the original image, we must make an impor-tant distinction because there are two types of compression. In lossless compression , the restored image is exactly the same as the original image, so that no information has been lost. Lossless compression techniques are applicable to any type of data, such as text, an image, a database of addresses, or a file containing an executable. On the other hand, the image restored by lossy compression is only similar to the original image. Lossy compres-sion techniques are applicable to data arising from real-world measurements, such as an audio signal, a photographic image, or a signal captured by some other type of sensor. - eBook - PDF
Computer Networks ISE
A Systems Approach
- Larry L. Peterson, Bruce S. Davie(Authors)
- 2007(Publication Date)
- Morgan Kaufmann(Publisher)
Of course, when talking about lossy compression algorithms, processing resources are not the only factor. Depending on the exact application, users are willing to make very different trade-offs between bandwidth (or delay) and extent of information loss due to compression. For example, a radiologist reading a mammogram is unlikely to tolerate 7.2 Data Compression 559 any significant loss of image quality and might well tolerate a delay of several hours in retrieving an image over a network. By contrast, it has become quite clear that many people will tolerate questionable audio quality in exchange for free global telephone calls (not to mention the ability to talk on the phone while driving). 7.2.1 Lossless Compression Algorithms We begin by introducing three lossless compression algorithms. We do not describe these algorithms in much detail—we just give the essential idea—since it is the lossy algorithms used to compress image and video data that are of the greatest utility in today’s network environment. We do comment, though, on how well these lossless algorithms work on digital imagery. Some of the ideas exploited by these lossless techniques show up again in later sections when we consider the lossy algorithms that are used to compress images. Run Length Encoding Run length encoding (RLE) is a compression technique with a brute-force simplicity. The idea is to replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occurs—hence the name “run length.” For example, the string AAABBCDDDD would be encoded as 3A2B1C4D. RLE can be used to compress digital imagery by comparing adjacent pixel values and then encoding only the changes. For images that have large homogeneous regions, this technique is quite effective. For example, it is not uncommon that RLE can achieve compression ratios on the order of 8-to-1 for scanned text images. - Graham King(Author)
- 1995(Publication Date)
- Butterworth-Heinemann(Publisher)
Redundancy reduction incorporates methods that enable messages to have their information content identified. Once the information content is identi-fied, the process is followed by a coding stage which seeks to allowtransmission 132 Communications data processing without redundancy. In this case no information is lost when the message is decoded and the overall strategy is lossless. Reconstruction is possible without error and this is essential where it is crucial to preserve the exact form of data, for example, when storing or sending computer programs or numeri-cal data. This perfect reconstruction feature exacts a price in terms of the Data Compression that is possible. Although compression depends on the inherent entropy of the source, typically between 2:1 and 5:1 can be achieved. Although this sounds modest it is well worthwhile, provided that the execu-tion time of the compression and decompression algorithms is not excessive. Entropy reduction methods actually achieve compression by losing some information. This is acceptable only when no important data is lost. For example, images often contain far more information than the eye can interpret or than the display can portray. Obviously it is necessary to be able quantify 'acceptable loss' in order to use entropy reducing techniques. In networking the most commonly used techniques are lossless and a number of different methods are popular. Even so, the advent of ISDN, the integrated services digital network, presages a time in the near future when speech, images and textual data will all be networked. 9.2.1 Run length coding Appropriate only when data has a repetitive nature, run length coding might be used at bit or word level. Consider a data stream in which one character is repeated many times in succession, for example: 0,0,0,0,0,0,0,0,0,0,0,2,3,7,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 The multiple Os might be encoded by ,number of occurrences.- Roberto Togneri, Christopher J.S deSilva(Authors)
- 2003(Publication Date)
- Chapman and Hall/CRC(Publisher)
Chapter 4 Data Compression 4.1 Introduction We have seen that entropy or information content is a measure of predictability or redundancy. In situations where there is redundancy in a body of information, it should be possible to adopt some form of coding which exploits the redundancy in order to reduce the space which the information occupies. This is the idea underlying approaches to Data Compression. In the previous chapter, we have looked at encoding methods that assume that suc-cessive characters in a sequence are independent or that the sequence was generated by a Markov source. In this chapter, we extend these ideas to develop data com-pression techniques that use the additional redundancy that arises when there are relationships or correlations between neighbouring characters in the sequence. This additional redundancy makes it possible to achieve greater compression, though at the cost of extra computational effort. EXAMPLE 4.1 Suppose we have a message that consists of only four letters, say A, B, C, D. To measure the information content of such a message, it is convenient to code the letters as binary digits, and count the total number of digits to give an estimate of the information content in bits. If there is no redundancy in the sequence, so that the letters occur at random, then we must use two binary digits to code each letter, say for A, for B, for and for D. A sequence such as will be coded as 171 172 Fundamentals of Information Theory and Coding Design The information content will be two bits per letter. Suppose instead that the occurrence of letters is not completely random, but is con-strained in some way, for example, by the rule that A is followed by B or C with equal probability (but never by D) , B is followed by C or D with equal probability, C is followed by D or A with equal probability and D is followed by A or B with equal probability.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.










