Computer Science

Compression

Compression is the process of reducing the size of data to save storage space or transmission time. It is achieved by encoding information using fewer bits than the original representation. There are two main types of compression: lossless, which retains all the original data, and lossy, which sacrifices some data to achieve higher compression ratios.

Written by Perlego with AI-assistance

10 Key excerpts on "Compression"

  • Book cover image for: Mathematics That Power Our World, The: How Is It Made?
    • Joseph Khoury, Gilles Lamothe(Authors)
    • 2016(Publication Date)
    • World Scientific
      (Publisher)
    Similarly, uncompressed texts, images, au-dio and video files or information transfer over digital networks require 39 40 The mathematics that power our world substantial storage capacity certainly not available on standard machines you use at the office or at home. By a data Compression algorithm , we usually mean a process through which we can represent data in a compact and digital form that uses less space to store or less time to transmit over a network than the original form. This is usually done by reducing unwanted noise (or redundancy) in the data to a certain degree where it is still possible to recover it in an acceptable form. The process of assigning digital codes to pieces of data for storage or transmission is called encoding . Of course, a Compression algorithm is only efficient when we are able to reverse the process, that is to retrieve the original sequence of characters from the encoded digital form. This process is called decoding or decompressing . In the literature, the word Compression often means both the Compression and the deCompression processes (or encoding and decoding) of data. 2.1.2 Before you go further Mathematical skills like manipulating algebraic inequalities, basic properties of logarithmic functions and some level of discrete mathematics are needed in this chapter. 2.2 Storage inside computers When you click the “Save” button after working on or viewing a document (text, image, audio, . . . ), a convenient interpretation of what happens next is to imagine that your computer stores the file in the form of a (long) finite sequence of 0’s and 1’s that we call a binary string. The reason why only two characters are used was explained briefly in the chapter on electronic calculators. 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 In this digital format, each of the two characters “0” and “1” is called a “ bit ” (binary digit). The number of bits in a string is called the length of the string.
  • Book cover image for: Introduction to Digital Audio
    • John Watkinson(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)

    5

    Compression

    5.1 Introduction

    Compression, bit rate reduction and data reduction are all terms which mean basically the same thing in this context. In essence the same (or nearly the same) audio information is carried using a smaller quantity and/or rate of data. It should be pointed out that in audio Compression traditionally means a process in which the dynamic range of the sound is reduced, typically by broadcasters wishing their station to sound louder. However, when bit rate reduction is employed, the dynamics of the decoded signal are unchanged. Provided the context is clear, the two meanings can co-exist without a great deal of confusion.
    There are several reasons why Compression techniques are popular:
    (a)  Compression extends the playing time of a given storage device. (b)  Compression allows miniaturization. With fewer data to store, the same playing time is obtained with smaller hardware. This is useful in portable and consumer devices. (c)  Tolerances can be relaxed. With fewer data to record, storage density can be reduced, making equipment which is more resistant to adverse environments and which requires less maintenance. (d)  In transmission systems, Compression allows a reduction in bandwidth which will generally result in a reduction in cost. This may make possible some process which would be uneconomic without it. (e)  If a given bandwidth is available to an uncompressed signal, Compression allows faster than real-time transmission within that bandwidth. (f)  If a given bandwidth is available, Compression allows a better-quality signal within that bandwidth.
    Compression is summarized in Figure 5.1 . It will be seen in (a) that the PCM audio data rate is reduced at source by the compressor . The compressed data are then passed through a communication channel and returned to the original audio rate by the expander . The ratio between the source data rate and the channel data rate is called the Compression factor . The term coding gain is also used. Sometimes a compressor and expander in series are referred to as a compander . The compressor may equally well be referred to as a coder and the expander a decoder in which case the tandem pair may be called a codec
  • Book cover image for: Signal Traffic
    eBook - ePub

    Signal Traffic

    Critical Studies of Media Infrastructures

    PART I

    Compression, Storage, Distribution

    Passage contains an image

    CHAPTER 1

    Compression

    A Loose History

    JONATHAN STERNE
    The use of the word Compression to describe a communication technology process comes rather late in its history. According to the Oxford English Dictionary , the term Compression is at least six hundred years old. Its use to describe the “condensation of thought and language” dates to the eighteenth century. The term was first applied to machinery—steam engines—in the mid-nineteenth century. Compression as a description of representation thus predates its use to describe a technical operation by about one hundred years.1
    Today, Compression in communication engineering refers to one of two things: data Compression or dynamic range Compression. People encounter data Compression every day in the form of zipped files, mp3s, jpegs, online videos, and mobile-phone voice algorithms. All of these technologies save precious bandwidth by eliminating categories of data that engineers have decided are redundant and therefore unnecessary to store or transmit. Dynamic range Compression refers to reducing the distance between the loudest and quietest parts of an audio signal. It is useful because a signal with less variance can have a higher overall average volume.
    Most writers outside the engineering world, and especially most humanities scholars, still understand Compression as something that happens after the fact, as supplemental to communication and its purposes, to perception, to interaction, and to the experiences attending them. In the wake of poststructuralism, few humanities writers would argue for verisimilitude2
  • Book cover image for: Nine Algorithms That Changed the Future
    eBook - ePub

    Nine Algorithms That Changed the Future

    The Ingenious Ideas That Drive Today's Computers

    7 Data Compression: Something for Nothing
    Emma was gratified, and would soon have shewn no want of words, if the sound of Mrs Elton's voice from the sitting-room had not checked her, and made it expedient to compress all her friendly and all her congratulatory sensations into a very, very earnest shake of the hand.
    —JANE AUSTEN , Emma
    We're all familiar with the idea of compressing physical objects: when you try to fit a lot of clothes into a small suitcase, you can squash the clothes so that they are small enough to fit even though they would overflow the suitcase at their normal size. You have compressed the clothes. Later, you can decompress the clothes after they come out of a suitcase and (hopefully) wear them again in their original size and shape.
    Remarkably, it's possible to do exactly the same thing with information: computer files and other kinds of data can often be compressed to a smaller size for easy storage or transportation. Later, they are decompressed and used in their original form.
    Most people have plenty of disk space on their own computers and don't need to bother about compressing their own files. So it's tempting to think that Compression doesn't affect most of us. But this impression is wrong: in fact, Compression is used behind the scenes in computer systems quite often. For example, many of the messages sent over the internet are compressed without the user even knowing it, and almost all software is downloaded in compressed form—this means your downloads and file transfers are often several times quicker than they otherwise would be. Even your voice gets compressed when you speak on the phone: telephone companies can achieve a vastly superior utilization of their resources if they compress voice data before transporting it.
    Compression is used in more obvious ways, too. The popular ZIP file format employs an ingenious Compression algorithm that will be described in this chapter. And you're probably very familiar with the trade-offs involved in compressing digital videos: a high-quality video has a much larger file size than a low-quality version of the same video.
  • Book cover image for: The Art of Image Processing with Java
    Image Compression 10 10.1 Overview The primary goal of image Compression is to minimize the memory footprint of image data so that storage and transmission times are minimized. Producing compact image data is essential in many image processing systems since stor-age capacity can be limited, as is the case with digital cameras, or costly, as is the case when creating large warehouses of image data. Transmission of image data is also a central concern in many image processing systems. Recent stud-ies of web use, for example, have estimated that images and video account for approximately 85% of all Internet traffic. Reducing the memory footprint of im-age data will correspondingly reduce Internet bandwidth consumption. More importantly, however, since most web documents contain image data it is vital that the image data be transferred over the network within a reasonable time frame. Reducing the memory footprint has the significant advantage of speeding delivery of web content to the end user. Image Compression works by identifying and eliminating redundant, or du-plicated, data from a source image. There are three main sources of redundancy in image Compression. The first is known as interpixel redundancy, which recog-nizes that pixels that are in close proximity within an image are generally related to each other. A Compression technique that reduces memory by recognizing some relationship between pixels based on their proximity is an attempt to elim-inate interpixel redundancy. Run length encoding, constant area coding, and JPEG encoding seek to eliminate this source of unnecessary data. The second source of redundancy is known as psycho-visual redundancy. Since the human visual system does not perceive all visible information with equal sensitivity we understand that some visual information is less important than others. Image Compression systems will simply eliminate information that is deemed to be unimportant in terms of human perception.
  • Book cover image for: Multimedia Computing
    124 Fundamentals of Compression 11 A major difference between multimedia data and most other data is its size. Images and audio files take much more space than text, for example. Video data is currently the single largest network bandwidth and hard disk space consumer. Compression was, therefore, among the first issues researchers in the emerging multimedia field sought to address. In fact, multimedia’s history is closely connected to different Compression algorithms because they served as enabling technologies for many applications. Even today, multimedia sig-nal processing would not be possible without Compression methods. A Blu-ray disc can currently store 50 Gbytes, but a ninety-minute movie in 1,080p HDTV format takes about 800 Gbytes (without audio). So how does it fit on the disc? The answer to many such prob-lems is Compression. This chapter discusses the underlying mathematical principles of Compression algo-rithms, from the basics to advanced techniques. However, all the techniques outlined in this chapter belong to the family of lossless Compression techniques; that is, the original data can be reconstructed bit by bit. Lossless Compression techniques are applicable to all kinds of data, including non-multimedia data. However, these techniques are not always effective with all types of data. Therefore, subsequent chapters will introduce lossy com-pression techniques that are usually tailored to a specific type of data, for example, image or sound files. RUN-LENGTH CODING Before discussing what Compression is and how you can develop algorithms to represent different types of content with the least amount of space possible, let’s start with a simple and intuitive example. In addition to introducing the concept of Compression, this exam-ple demonstrates a practical method that computer scientists often use to compress data with large areas of the same values.
  • Book cover image for: Image Processing and Analysis
    On a typical image-sharing website, hundreds of millions of photographs are uploaded every day, amounting to several exabytes per year of images. † These numbers are staggering, and although we are starting to reach the point where memory is cheap enough that we can begin to think about storing large collections of raw images and videos at home or on a server, limited transmission speeds and the desire to store these data on mobile devices, not to mention rapidly increasing rates of content creation, continue to motivate the need for compressing and decompressing the data. An overview of a Compression/deCompression system is provided in Figure 8.1. A stream of bits (in our case an image) is fed to a compressor , which converts the stream to a smaller stream of bits. This new stream is then either stored as a file on disk or transmitted across a network, where on the other end a decompressor restores the original image. Sometimes the compressor and decompressor are known as a coder and decoder , respectively, so that the software part of the system is collectively known as a codec . When we say that the decompressor restores the original image, we must make an impor-tant distinction because there are two types of Compression. In lossless Compression , the restored image is exactly the same as the original image, so that no information has been lost. Lossless Compression techniques are applicable to any type of data, such as text, an image, a database of addresses, or a file containing an executable. On the other hand, the image restored by lossy Compression is only similar to the original image. Lossy compres-sion techniques are applicable to data arising from real-world measurements, such as an audio signal, a photographic image, or a signal captured by some other type of sensor.
  • Book cover image for: A Concise Introduction to Image Processing using C++
    199 6 C H A P T E R Image Compression S torage and transmission are essential processes in image processing. As discussed in Chapter 1, images are generally stored in the bitmap for-mat, and the memory in spatial dimensions could be very large if images are stored directly without preprocessing. For example, the data of an 8-bit grey-scale image with the resolution 256 × 256 requires a total memory of 65536 bytes (or 64 kilobytes). The memory required for a true colour image increases to 64 kilobytes × 3 = 192 kilobytes. Under the National Television Standard Committee (NTSC) standard, 30 frames of images are played in one second to ensure continuous vision effect. Suppose the images are true colour having a resolution of 720 × 576, the images played in one second would require the storage size of 720 × 576 × 3 × 30 = 37324800 bytes = 36 megabytes. Such a huge amount of data would cause enormous diffi-culties during storage or transmission. Therefore, Compression of original images is inevitable to facilitate transmission or other processes. The essence of Compression is to use a compressed file with smaller storage size requirements to replace the original one. The compressed file can be reverted to the original file through deCompression. If the decom-pressed image is identical to the original image, the corresponding com-pression method is called lossless Compression ; otherwise, it is called lossy Compression . Common lossy Compression methods include predictive Compression, vector quantisation, transform encoding, wavelet compres-sion, and fractal Compression. The last two methods are considered as state-of-the-art transform Compression techniques. Compression rate can be used to assess the efficiency of a compres-sion method. It is defined as the ratio of the size of the original file to the compressed file. If the size of the original file and the compressed file are
  • Book cover image for: Compression for Great Video and Audio
    eBook - ePub

    Compression for Great Video and Audio

    Master Tips and Common Sense

    • Ben Waggoner(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    In fact, we use randomness as a measure of compressibility. Compression is sometimes called “entropy coding,” since what you’re really saving is the entropy (randomness) in the data, while the stuff that could be predicted from that entropy is what gets compressed away to be reconstructed on decode.

    The More Efficient the Coding, the More Random the Output

    Using a codebook makes the file smaller by reducing redundancy. Because there is less redundancy, there is by definition less of a pattern to the data itself, and hence the data itself looks random. You can look at the first few dozen characters of a text file, and immediately see what language it’s in. Look at the first few dozen characters of a compressed file, and you’ll have no idea what it is.

    Data Compression

    Data Compression is Compression that works on arbitrary content, like computer files, without having to know much in advance about their contents. There have been many different Compression algorithms used over the past few decades. Ones that are currently available use different techniques, but they share similar properties.
    The most-used data Compression technique is Deflate, which originated in PKWare’s .zip format and is also used in .gz files, .msi installers, http header Compression, and many, many other places. Deflate was even used in writing this book—Microsoft Word’s .docx format (along with all Microsoft Office “.???x” formats) is really a directory of files that are then Deflated into a single file.
    For example, the longest chapter in my current draft (“Production, Post, and Acquisition”) is 78,811 bytes. Using Deflate, it goes down to 28,869 bytes. And if I use an advanced texttuned compressor like PPMd, (included in the popular 7-Zip tool), it can get down to 22,883 bytes. But that’s getting pretty close to the theoretical lower limit for how much this kind of content can be compressed. That’s called the Shannon limit, and data Compression is all about getting as close to that as possible.
  • Book cover image for: A Primer on Compression in the Memory Hierarchy
    • Somayeh Sardashti, Angelos Arelakis, Per Stenström, David A. Wood(Authors)
    • 2022(Publication Date)
    • Springer
      (Publisher)
    3 C H A P T E R 2 Compression Algorithms In information theory, the entropy of a source input is the amount of information contained in that data [126]. Entropy determines the number of bits needed to optimally represent the original source data. erefore, entropy sets an upper bound on the potential for Compression. Low entropy suggests that data can be represented with fewer bits. Although computer designers try to use efficient coding for different data types, the memory footprints of many applications still have low entropy. Compression algorithms compress a data message into a series of code words by exploiting the low entropy in the data. ese algorithms map a data message to compressed code words by operating on the data message as a sequence of input symbols, e.g., bits, bytes, or words. In this section, we first establish the information theoretic foundations, then introduce a taxonomy of algorithms, and classify different Compression algorithms. We then introduce the main metrics to evaluate the success of a given Compression algorithm. 2.1 VALUE LOCALITY Computers access and process data in chunks of particular sizes depending on the data types forming the values. For example, some accesses are for 64-bit floating-numbers whereas others are for 32-bit integers. During program execution, it is possible for a previously accessed value to be accessed again either in the same memory location or in another location [112]. We refer to the property that exactly the same value or a set of similar values are replicated across multiple memory locations, as value locality [113]. Conventional cache/memory hierarchies seek to exploit the principle of reference locality, predicting that the same blocks will be frequently accessed. erefore, when programs that exhibit value locality run on systems with conventional cache/memory hierarchies, a subset of data values may be replicated and saved in several different memory/cache locations causing value replication.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.