Computer Science

Types of Compression

Compression in computer science can be categorized into two main types: lossless and lossy compression. Lossless compression reduces file size without losing any data, making it suitable for text and data files. Lossy compression, on the other hand, reduces file size by eliminating some data, making it more suitable for multimedia files like images, audio, and video.

Written by Perlego with AI-assistance

Related key terms

1 of 5

12 Key excerpts on "Types of Compression"

eBook - ePub
Fundamental Data Compression
- Ida Mengyi Pu(Author)
- 2005(Publication Date)
- Butterworth-Heinemann
  (Publisher)
Chapter 1
Introduction

Data compression is, in the context of computer science, the science (and art) of representing information in a compact form. It has been one of the critical enabling technologies for the ongoing digital multimedia revolution for decades.

Most people frequently use data compression software such as zip , gzip and WinZip (and many others) to reduce the file size before storing or transferring it in media. Compression techniques are embedded in more and more software and data are often compressed without people knowing it.

Data compression has become a common requirement for most application software as well as an important and active research area in computer science. Without compression techniques, none of the ever-growing Internet, digital TV, mobile communication or increasing video communication techniques would have been practical developments.
Typical examples of application areas that are relevant to and motivated by data compression include
• personal communication systems such as facsimile, voice mail and telephony • computer systems such as memory structures, disks and tapes • mobile computing • distributed computer systems • computer networks, especially the Internet • multimedia evolution, imaging, signal processing • image archival and videoconferencing • digital and satellite TV.

Practical problems have motivated various researches in data compression. Equally, research in data compression has also been based on or stimulated other new subject areas. Partly due to its broad application territory, data compression overlaps with many science branches and can be found in many different subject areas. For example, you will see chapters or sections dedicated to data compression in books on

• information theory • coding theory • computer networks and telecommunications • digital signal processing • image processing • multimedia • steganography • computer security.

The language used in unrelated disciplines can be substantially different. In this book, the word data is in general used to mean the information in digital form on which computer programs operate, and compression
Sign up to read
Learn more about book
eBook - PDF
Mathematical Tools in Signal Processing with C++ and Java Simulations
- Willi-Hans Steeb(Author)
- 2005(Publication Date)
- WSPC
  (Publisher)
In the world of small systems, dictionary based data compression techniques seem to be more popular at this time. However, by combing arithmetic coding with powerful modelling techniques, statistical methods for data compression can actually achieve better performance. Lossy compression refers to data compression techniques in which some amount of data is lost. Lossy compression technologies attempts to elimi- nate redundant or unnecessary information. Most video compression tech- nologies, such as MPEG, use a lossy techniques. For example wavelet compression is a lossy compression of an image. Images compressed us- ing wavelets are smaller than JPEG images and can be transferred and downloaded at quicker speeds. Wavelet technology can compress colour images from 20:l to 300:1, grayscale images from 1O:l to 50:l. MPEG-4 uses wavelets for compression. MPEG-4 was standardised in October 1998 in the ISO/IEC document 14496. Lossless compression refers to data compression techniques in which no data is lost. The PKZIP compression technology is an example of lossless compression. For most types of data, lossless compression techniques can reduce the space needed by only about 50%. For greater compression, one must use a lossy compression technique. Note, however, that only certain types of data ~ graphics, audio, and video - can tolerate lossy compression. We must use lossless compression techniques when compressing data and programs. Most of the data compression methods in common use today fall into one of two camps: dictionary based schemes and statistical methods. Dictio- 208 CHAPTER 13. DATA COMPRESSION nary based compression systems operate by replacing groups of symbols in the input text with fixed length codes. An example of a dictionary based scheme is LZW compression. Data compression can be achieved by assigning short codes to the most fre- quently encountered source characters and necessarily longer codes to the others.
Sign up to read
Learn more about book
eBook - PDF
A Primer on Compression in the Memory Hierarchy
- Somayeh Sardashti, Angelos Arelakis, Per Stenström, David A. Wood(Authors)
- 2022(Publication Date)
- Springer
  (Publisher)
2.2 COMPRESSION ALGORITHM TAXONOMY is section introduces a taxonomy that will help classify the diﬀerences between the various loss- less compression algorithms that have been proposed for use in the memory hierarchy. In lossless algorithms, decompression can exactly recover the original data, while with lossy algorithms only an approximation of the original data can be recovered. Lossy algorithms are mostly used in voice and image compression where lost data does not adversely aﬀect their usefulness. On the other hand, compression algorithms used in the memory hierarchy must, in general, be lossless since any single memory bit loss or change may aﬀect the validity of a program’s results. Table 2.1 presents our taxonomy, including showing where a collection of well-known com- pression algorithms ﬁt within it. For each algorithm, we classify it as: • General purpose vs. special purpose • Static vs. dynamic 6 2. COMPRESSION ALGORITHMS Table 2.1: Compression algorithms taxonomy Temporal-Value Based Spatial-Value Based General-Purpose Static Static Huffman Coding [24] FVC [38] Null [37] Significance-Based Address Compression [43][44] Null [37] Dynamic Run-Length Encoding Lempel-Ziv [27][28] Dynamic Huffman Coding [25] C-PACK [18] FPC [20] BDI [19] C-PACK [18] Special-Purpose Static Instruction Compression [1]-[10] Dynamic Floating-point Compression [12]- [16] Floating-point Compression [127] • Temporal-value based vs. spatial-value based General-Purpose versus Special-Purpose: General-purpose algorithms target compressing data messages independent of their underlying data types or semantics. Many existing algorithms fall in this category, including BZIP2, UNIX gzip, and most algorithms used in compressed caches or memory. Conversely, specialized compression algorithms optimize for speciﬁc data types, exploit- ing the semantic knowledge of the data being compressed.
Sign up to read
Learn more about book
eBook - PDF
Computer Networks ISE
A Systems Approach
- Larry L. Peterson, Bruce S. Davie(Authors)
- 2007(Publication Date)
- Morgan Kaufmann
  (Publisher)
Of course, when talking about lossy compression algorithms, processing resources are not the only factor. Depending on the exact application, users are willing to make very different trade-offs between bandwidth (or delay) and extent of information loss due to compression. For example, a radiologist reading a mammogram is unlikely to tolerate 7.2 Data Compression 559 any significant loss of image quality and might well tolerate a delay of several hours in retrieving an image over a network. By contrast, it has become quite clear that many people will tolerate questionable audio quality in exchange for free global telephone calls (not to mention the ability to talk on the phone while driving). 7.2.1 Lossless Compression Algorithms We begin by introducing three lossless compression algorithms. We do not describe these algorithms in much detail—we just give the essential idea—since it is the lossy algorithms used to compress image and video data that are of the greatest utility in today’s network environment. We do comment, though, on how well these lossless algorithms work on digital imagery. Some of the ideas exploited by these lossless techniques show up again in later sections when we consider the lossy algorithms that are used to compress images. Run Length Encoding Run length encoding (RLE) is a compression technique with a brute-force simplicity. The idea is to replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occurs—hence the name “run length.” For example, the string AAABBCDDDD would be encoded as 3A2B1C4D. RLE can be used to compress digital imagery by comparing adjacent pixel values and then encoding only the changes. For images that have large homogeneous regions, this technique is quite effective. For example, it is not uncommon that RLE can achieve compression ratios on the order of 8-to-1 for scanned text images.
Sign up to read
Learn more about book
eBook - PDF
Multimedia Computing
- Gerald Friedland, Ramesh Jain(Authors)
- 2014(Publication Date)
- Cambridge University Press
  (Publisher)
124 Fundamentals of Compression 11 A major difference between multimedia data and most other data is its size. Images and audio files take much more space than text, for example. Video data is currently the single largest network bandwidth and hard disk space consumer. Compression was, therefore, among the first issues researchers in the emerging multimedia field sought to address. In fact, multimedia’s history is closely connected to different compression algorithms because they served as enabling technologies for many applications. Even today, multimedia sig-nal processing would not be possible without compression methods. A Blu-ray disc can currently store 50 Gbytes, but a ninety-minute movie in 1,080p HDTV format takes about 800 Gbytes (without audio). So how does it fit on the disc? The answer to many such prob-lems is compression. This chapter discusses the underlying mathematical principles of compression algo-rithms, from the basics to advanced techniques. However, all the techniques outlined in this chapter belong to the family of lossless compression techniques; that is, the original data can be reconstructed bit by bit. Lossless compression techniques are applicable to all kinds of data, including non-multimedia data. However, these techniques are not always effective with all types of data. Therefore, subsequent chapters will introduce lossy com-pression techniques that are usually tailored to a specific type of data, for example, image or sound files. RUN-LENGTH CODING Before discussing what compression is and how you can develop algorithms to represent different types of content with the least amount of space possible, let’s start with a simple and intuitive example. In addition to introducing the concept of compression, this exam-ple demonstrates a practical method that computer scientists often use to compress data with large areas of the same values.
Sign up to read
Learn more about book
eBook - PDF
Mathematics That Power Our World, The: How Is It Made?
How Is It Made?
- Joseph Khoury, Gilles Lamothe(Authors)
- 2016(Publication Date)
- World Scientific
  (Publisher)
Similarly, uncompressed texts, images, au-dio and video files or information transfer over digital networks require 39 40 The mathematics that power our world substantial storage capacity certainly not available on standard machines you use at the office or at home. By a data compression algorithm , we usually mean a process through which we can represent data in a compact and digital form that uses less space to store or less time to transmit over a network than the original form. This is usually done by reducing unwanted noise (or redundancy) in the data to a certain degree where it is still possible to recover it in an acceptable form. The process of assigning digital codes to pieces of data for storage or transmission is called encoding . Of course, a compression algorithm is only efficient when we are able to reverse the process, that is to retrieve the original sequence of characters from the encoded digital form. This process is called decoding or decompressing . In the literature, the word compression often means both the compression and the decompression processes (or encoding and decoding) of data. 2.1.2 Before you go further Mathematical skills like manipulating algebraic inequalities, basic properties of logarithmic functions and some level of discrete mathematics are needed in this chapter. 2.2 Storage inside computers When you click the “Save” button after working on or viewing a document (text, image, audio, . . . ), a convenient interpretation of what happens next is to imagine that your computer stores the file in the form of a (long) finite sequence of 0’s and 1’s that we call a binary string. The reason why only two characters are used was explained briefly in the chapter on electronic calculators. 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 In this digital format, each of the two characters “0” and “1” is called a “ bit ” (binary digit). The number of bits in a string is called the length of the string.
Sign up to read
Learn more about book
eBook - PDF
Fundamentals of Information Theory and Coding Design
- Roberto Togneri, Christopher J.S deSilva(Authors)
- 2003(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
This is an example of lossless compression , where no information is lost in the coding and decoding. When images are compressed, it may be permissible for the decompressed image not to have exactly the same pixel values as the original image, provided the difference is not perceptible to the eye. In this case, some form of lossy compression may be acceptable. This involves a loss of information between the coding and decoding processes. 4.3 Run-length Coding Run-length coding is a simple and effective means of compressing data in which it is frequently the case that the same character occurs many times in succession. This may be true of some types of image data, but it is not generally true for text, where it is rare for a letter of the alphabet to occur more than twice in succession. To compress a sequence, one simply replaces a repeated character with one instance of the character followed by a count of the number of times it occurs. For example, the sequence could be replaced by 174 Fundamentals of Information Theory and Coding Design reducing the number of characters from 24 to 16. To decompress the sequence, each combination of a character and a count is replaced by the appropriate number of characters. Protocols need to be established to distinguish between the characters and the counts in the compressed data. While the basic idea of run-length coding is very simple, complex protocols can be developed for particular purposes. The standard for fac-simile transmission developed by the International Telephone and Telegraph Con-sultative Committee (CCITT) (now the International Telecommunications Union) [4] involves such protocols. 4.4 The CCITT Standard for Facsimile Transmission Facsimile machines have revolutionised the way in which people do business. Send-ing faxes now accounts for a major part of the traffic on telephone lines.
Sign up to read
Learn more about book
eBook - ePub
Digital Image Processing and Analysis
Image Enhancement, Restoration and Compression
- Scott E Umbaugh(Author)
- 2022(Publication Date)
- CRC Press
  (Publisher)
For complex images, these methods are limited to compressing the image file to about one-half to one-third its original size (2:1–3:1), often the achievable compression is much less. For simple images such as text-only images, lossless methods may achieve much higher compression. The second type of compression methods are called lossy, since they allow a loss in the actual image data, so the original uncompressed image can not be created exactly from the compressed file. For complex monochrome images, these techniques can achieve compression ratios of about 10–50 and still retain high-quality visual information. For multiband (including color) images, simple images, or for lower quality results, compression ratios as high as 200 or more can be attained. Compression algorithms are developed by taking advantage of the redundancy that is inherent in image data. Four primary types of redundancy can be found in images: (1) coding, (2) interpixel, (3) interband and (4) psychovisual redundancy. Coding redundancy occurs when the data used to represent the image is not utilized in an optimal manner. For example, if we have an 8-bit-per-pixel image which allows 256 gray-level values, but the actual image contains only 16 gray-level values, this is a suboptimal coding – only 4 bits per pixel are actually needed. Interpixel redundancy occurs because adjacent pixels tend to be highly correlated. This is a result of the fact that in most images, the brightness levels do not change rapidly, but change gradually, so adjacent pixel values tend to be relatively close to each other in value (for video, or motion images, this concept can be extended to include inter-frame redundancy, redundancy between frames of image data). Interband redundancy occurs in color (and multiband) images due to the correlation between bands within an image – if we extract the red, green and blue bands, they all look similar
Sign up to read
Learn more about book

eBook - ePub

Art of Digital Audio

John Watkinson(Author)
2013(Publication Date)
Routledge
(Publisher)

5 Compression

5.1 Introduction

Compression, bit rate reduction and data reduction are all terms which mean basically the same thing in this context. In essence the same (or nearly the same) audio information is carried using a smaller quantity or rate of data. It should be pointed out that in audio, compression traditionally means a process in which the dynamic range of the sound is reduced, typically by broadcasters wishing their station to sound louder. However, when bit rate reduction is employed, the dynamics of the decoded signal are unchanged. Provided the context is clear, the two meanings can co-exist without a great deal of confusion.

There are several reasons why compression techniques are popular:

(a)	Compression extends the playing time of a given storage device.
(b)	Compression allows miniaturization. With fewer data to store, the same playing time is obtained with smaller hardware. This is useful in portable and consumer devices.
(c)	Tolerances can be relaxed. With fewer data to record, storage density can be reduced, making equipment which is more resistant to adverse environments and which requires less maintenance.
(d)	In transmission systems, compression allows a reduction in bandwidth which will generally result in a reduction in cost. This may make possible some process which would be uneconomic without it.
(e)	If a given bandwidth is available to an uncompressed signal, compression allows faster than real-time transmission within that bandwidth.
(f)	If a given bandwidth is available, compression allows a better-quality signal within that bandwidth.

Figure 5.1 In (a) a compression system consists of compressor or coder, a transmission channel and a matching expander or decoder. The combination of coder and decoder is known as a codec. (b) MPEG is asymmetrical since the encoder is much more complex than the decoder.

Compression is summarized in Figure 5.1 . It will be seen in (a) that the PCM audio data rate is reduced at source by the compressor. The compressed data are then passed through a communication channel and returned to the original audio rate by the expander. The ratio between the source data rate and the channel data rate is called the compression factor. The term coding gain is also used. Sometimes a compressor and expander in series are referred to as a compander. The compressor may equally well be referred to as a coder and the expander a decoder in which case the tandem pair may be called a codec

Learn more about book

eBook - ePub
Compression for Great Video and Audio
Master Tips and Common Sense
- Ben Waggoner(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
Generating a codebook for each compressed file is time-consuming, expands the size of the file, and increases time to compress. Ideally, a compression technology will be able to be tuned to the structure of the data it gets. This is why lossless still image compression will typically make the file somewhat smaller than doing data compression on the same uncompressed source file, and will do it faster as well. We see the same thing as with the text compression example.

Small Increases in Compression Require Large Increases in Compression Time

There is a fundamental limit to how small a given file can be compressed, called the Shannon limit. For random data, the limit is the same as the size of the source file. For highly redundant data, the limit can be tiny. A file that consists of the pattern “01010101” repeated a few million times can be compressed down to a tiny percentage of the original data. However, real-world applications don’t get all the way to the Shannon limit, since it requires an enormous amount of computer horsepower, especially as the files get larger. Most compression applications have a controlling tradeoff between encoding speed and compression efficiency. In essence, these controls expand the amount of the file that is being examined at any given moment, and the size of the codebook that is searched for matches. However, doubling compression time doesn’t cut file size in half! Doubling compression time might only get you a few percentages closer to the Shannon limit for the file. Getting a file 10 percent smaller might take more than 10 times the processing time, or be flat-out impossible.

Lossy and Lossless Compression

Lossless compression codecs preserve all of the information contained within the original file. Lossy codecs, on the other hand, discard some data contained in the original file during compression. Some codecs, like PNG, are always lossless. Others like VC-1 are always lossy. Others still may or may not be lossy depending on how you set their quality and data rate options. Lossless algorithms, by definition, might not be able to compress the file any smaller than it started. Lossy codecs generally let you specify a target data rate, and discard enough information to hit that data rate target. This really only makes sense with media—we wouldn’t want poems coming out with different words after compression!
Sign up to read
Learn more about book
eBook - PDF
The Art of Image Processing with Java
- Kenny A. Hunt(Author)
- 2016(Publication Date)
- A K Peters/CRC Press
  (Publisher)
Image Compression 10 10.1 Overview The primary goal of image compression is to minimize the memory footprint of image data so that storage and transmission times are minimized. Producing compact image data is essential in many image processing systems since stor-age capacity can be limited, as is the case with digital cameras, or costly, as is the case when creating large warehouses of image data. Transmission of image data is also a central concern in many image processing systems. Recent stud-ies of web use, for example, have estimated that images and video account for approximately 85% of all Internet traffic. Reducing the memory footprint of im-age data will correspondingly reduce Internet bandwidth consumption. More importantly, however, since most web documents contain image data it is vital that the image data be transferred over the network within a reasonable time frame. Reducing the memory footprint has the significant advantage of speeding delivery of web content to the end user. Image compression works by identifying and eliminating redundant, or du-plicated, data from a source image. There are three main sources of redundancy in image compression. The first is known as interpixel redundancy, which recog-nizes that pixels that are in close proximity within an image are generally related to each other. A compression technique that reduces memory by recognizing some relationship between pixels based on their proximity is an attempt to elim-inate interpixel redundancy. Run length encoding, constant area coding, and JPEG encoding seek to eliminate this source of unnecessary data. The second source of redundancy is known as psycho-visual redundancy. Since the human visual system does not perceive all visible information with equal sensitivity we understand that some visual information is less important than others. Image compression systems will simply eliminate information that is deemed to be unimportant in terms of human perception.
Sign up to read
Learn more about book
eBook - ePub
Advanced Biomedical Image Analysis
- Mark Haidekker(Author)
- 2011(Publication Date)
- Wiley
  (Publisher)
25 and in a 10-year period, those patients underwent almost 5 million diagnostic tests. In the period from 1997 to 2006, imaging with computed tomography (CT) doubled, from 81 examinations to 181 examinations, and imaging with magnetic resonance imaging (MRI) tripled, from 22 examinations to 72 examinations per 1000 patients. Digital imaging modalities, such as CT and MRI, are not the only sources of digital images. Frequently, film archives are digitized for computer archiving and computer image analysis.

Clearly, image compression (the reduction of the image storage size) would reduce the burden on storage systems and on network bandwidth. There are two fundamental types of image compression: lossless compression and lossy compression. Lossless compression is a compression scheme that allows the exact reconstruction of the original data from compressed data. Typically, a lossless compression scheme can achieve up to 50% reduction of the storage size, but the compression rates are much lower in images with a large noise component. Conversely, lossy compression schemes do not allow exact reconstruction of the original data. However, lossy compression rates by far exceed those of lossless compression. When choosing a lossy compression scheme, it is therefore crucial to find an acceptable balance between the compression rate and the restored image quality. Similarly, film digitization requires finding a balance between digitized resolution (both spatial resolution and the number of gray levels) and image size.

The human eye is fairly insensitive to gray levels.28 About 20 different levels of intensity can be distinguished in a small image region. Because the eye is able to adapt to different levels of brightness, about 100 levels of intensity are reasonable in a digital image. While 100 levels of intensity can be represented in 7 bits, the most common choice is a resolution of 8 bits/pixel or 256 possible levels of intensity. The number of discrete intensity levels is often called the depth of the image. Figure 12.1 demonstrates the effect of low image depths.

FIGURE 12.1 Relevance of the image depth (number of bits per pixel). The original T 1
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

1 of 8

View all

Types of Compression

Related key terms

12 Key excerpts on "Types of Compression"

Fundamental Data Compression

Introduction

Mathematical Tools in Signal Processing with C++ and Java Simulations

A Primer on Compression in the Memory Hierarchy

Computer Networks ISE

A Systems Approach

Multimedia Computing

Mathematics That Power Our World, The: How Is It Made?

How Is It Made?

Fundamentals of Information Theory and Coding Design

Digital Image Processing and Analysis

Image Enhancement, Restoration and Compression

Art of Digital Audio

5

Compression

5.1 Introduction

Compression for Great Video and Audio

Master Tips and Common Sense

Small Increases in Compression Require Large Increases in Compression Time

Lossy and Lossless Compression

The Art of Image Processing with Java

Advanced Biomedical Image Analysis

Explore more topic indexes