Languages & Linguistics

Spectrogram

A spectrogram is a visual representation of the spectrum of frequencies in a sound signal as it varies with time. It is commonly used in linguistics to analyze speech sounds and to study the acoustic properties of different languages. Spectrograms provide valuable information about the characteristics of speech sounds, such as their duration, intensity, and frequency content.

Written by Perlego with AI-assistance

5 Key excerpts on "Spectrogram"

  • Book cover image for: A Field Manual for Acoustic Phonetics
    3 Sound Spectrograms and Spectra
    The speech wave graphs (speech waveforms, oscillograms) that were discussed in chapter 2 provide one way of looking at speech. Speech waveforms are direct representations of air pressure fluctuations; their production does not require any further analysis of the speech signal.
    Speech waveforms already provide a lot of information about speech. Inspection of waveforms can give clues about phonetic features of segments (consonants and vowels), about their relative loudness, and about their duration. However, other views of a speech signal are possible that provide additional information.
    The development of the sound spectrograph, in the early 1940s, has meant a tremendous boost for the science of phonetics. The sound Spectrograms (also called sonagrams) that are produced by this machine provide a wealth of information about the acoustic properties of sounds in general and speech sounds in particular. In the early days, some researchers believed that it would be possible to teach a wide range of people to “read” Spectrograms of spoken utterances (that is, to recover the original, spoken message by mere visual inspection of its Spectrogram). For a short while it was hoped that this could be a solution for the communication problems of deaf people (see Potter, Kopp, and Green 1947 ).
    While this hope proved to be too optimistic, it is still true that the spectrograph has played an essential role in the rapid advancement of the field of phonetics after the Second World War. Spectrograms are particularly useful for studying the acoustic properties of individual vowels and consonants. They show, for example, the differences between vowel qualities such as [i], [a], and [u] much more clearly than speech wave graphs (in fact it is often very difficult, if not impossible, to recognize vowel qualities in speech wave graphs, even for experienced phoneticians).
    The spectrograph as a stand-alone piece of equipment has become obsolete. As one phonetician wrote some forty years ago: “…it will eventually be old fashioned to build speech processing devices for specific purposes. Instead, the phonetics laboratory computer will be equipped with programs for simulating any desired form of speech analysis and synthesis” (Fant 1968:175
  • Book cover image for: Sociophonetics
    eBook - PDF

    Sociophonetics

    An Introduction

    An example is shown in Figure 2.10. The other kinds of visual representations, and the ones that you use for most acoustic analyses, are power spectra and Spectrograms. In a power spectrum , fre-quency in Hz is shown on the x-axis and amplitude in dB on the y-axis, as in Figure 2.11. Time isn’t shown on a spectrum because a spectrum is a snapshot – a 28 SOCIOPHONETICS picture of a single moment of time. A Spectrogram consists of a series of con-secutive spectra turned on their sides and lined up. On a Spectrogram, as in Figure 2.12, time is shown on the x-axis, frequency is shown on the y-axis and amplitude is indicated by the darkness of the items within the Spectrogram. Time (s) 0 1.602 0 5000 Frequency (Hz) came along a big bad wolf Figure 2.12 A wideband Spectrogram of some running speech that includes the syllable from Figure 2.10. The utterance is ‘came along a big, bad wolf’. 0 2000 4000 6000 8000 10000 − 40 − 20 0 20 40 60 Amplitude (dB) Frequency (Hz) Figure 2.11 A narrowband power spectrum of a point at the centre of the vowel from Figure 2.10. PRODUCTION 29 Time (s) 0 0.2207 0 5000 Frequency (Hz) Vowel formants Vocal fold vibrations Transient Figure 2.13 A wideband Spectrogram of the same spoken syllable as in Figure 2.10. Note how the vowel formants, vocal fold vibrations and a transient are all readily discernible. Back in the old days, spectra and Spectrograms were generated by a machine called a spectrograph . The spectrograph was encased in a big, clumsy box and it printed Spectrograms on special, expensive paper – not conducive to univer-sity budgets today – onto which the image was burned with a stylus. Around the early 1990s, spectrographs were superseded by spectrographic software. Current software is significantly faster, more user-friendly and more versatile than spectrographs were. It even makes better-looking Spectrograms than the old equipment. The settings of Spectrograms and spectra can be varied, depending on what you want to examine.
  • Book cover image for: Speech Acoustic Analysis
    • Philippe Martin(Author)
    • 2020(Publication Date)
    • Wiley-ISTE
      (Publisher)
    6 Spectrograms

    6.1. Production of Spectrograms

    The Spectrogram, together with the melody analyzer (Chapter 7 ), is the preferred tool of phoneticians for the acoustic analysis of speech. This graphical representation of sound is made in the same way as cinema films, by taking “snapshots” from the sound continuum, analyzed by Fourier transform or Fourier series.
    Each snapshot results in the production of a spectrum showing the distribution of the amplitudes of the different harmonic components; in other words, a two-dimensional graph with frequency on the x-axis and amplitude on the y-axis. To display the spectral evolution over time, it is therefore necessary to calculate and display the different spectra on the time axis through a representation of time on the abscissa, frequency on the ordinate, and amplitude as a third dimension coded by color or a level of gray.
    Theoretically, considering the speed of changes in the articulatory organs, the necessary number of spectra per second is in the order of 25 to 30. However, a common practice is to relate the number of temporal snapshots to the duration of the sampling window, which in turn determines the frequency resolution of the successive spectra obtained. This is done by overlapping the second temporal half of a window with the next window.
    We have seen that the frequency resolution, i.e., the interval between two values on the frequency axis, is equal to the reverse of the window duration. In order to be able to observe harmonics of a male voice at 100 Hz, for example, a frequency resolution of at least 25 Hz is required, that is, a window of 40 ms. A duration of 11 ms, corresponding to a frequency resolution of about 300 Hz, leads to analysis snapshots every 5.5 ms. A better frequency resolution is obtained through a window of 46 ms, which corresponds to the overlap with the spectrum every 23 ms (Figure 6.1
  • Book cover image for: Experimental Phonetics
    eBook - ePub

    Experimental Phonetics

    An Introduction

    • Katrina Hayward(Author)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Chapter 3Analysing sound: the spectrograph

    3.1 Introduction

    In Chapter 2 , we adopted an abstract way of thinking about and describing sound. Sound waves are, in essence, complex patterns of variation through time, which we can represent as two-dimensional pictures, or waveforms , on pieces of paper or computer screens. In principle, any such wave is a combination of sine waves, and any two-dimensional waveform can be re-created by combining sine waveforms. Because of this, it is also possible to describe a wave by specifying its sine wave components. This is the basic idea behind the spectrum . A spectrum is a two-dimensional diagram which shows which sine wave components are present and their relative strengths.
    Unfortunately, the knowledge that every sound, in principle, has a spectrum does not in itself enable us to determine what that spectrum is. In a few cases – for example, the imitation [u] of Figure 2.11 – it is possible to arrive at a reasonably good approximation by visual analysis and a small amount of trial and error. However, such methods are not really practical for research into speech because, even when they work well, they are too slow. The difficulty is made worse by the fact that the spectrum of speech is constantly changing. Even at slow tempos, segments of very different types, such as stops, fricatives and vowels, alternate with one another in quick succession. In any case, as students of speech, we want to do more than analyse individual segments; we want to see how the spectrum changes as the speaker moves from one segment to another.
    A practical solution to the problem is provided by the technique of sound spectrography . Spectrography is the most important technique available to the experimental phonetician, and the visual images which it produces – known as Spectrograms
  • Book cover image for: The Sounds of Language
    eBook - PDF

    The Sounds of Language

    An Introduction to Phonetics and Phonology

    The procedure for making the Spectrogram is essentially the same as that diagrammed in Figures 7.17 and 7.18. Spectra are taken at repeated intervals (in the case of Figure 7.18 about every 5 ms), and the amplitude of energy in different frequency bands is computed. When the information from sequential spectra is graphed over time, with amplitude coded as darkness, the formants (regions of high- amplitude energy) show up as dark stripes that move over time, reflecting changes in resonance frequencies as vocal tract articulators change position. The F2 stripe moves gradually from about 1500 Hz at time 50 ms, indicative of central [a], to 2400 at 200 ms, indicative of front [ ], then down again to about 1700 Hz, indicative of [ ]. The white areas at beginning and end correspond to the stop closures, when there is no high-amplitude energy in the signal. This type of Spectrogram is called a wide-band or broad-band Spectrogram, because the formant frequencies show up as broad bands. In a wide-band Spectrogram, spectra are taken from very short “windows” of the speech signal at frequent intervals, so that changes that take place over very short time periods are evident. The burst at the release of the [b] shows up as a line of energy at about 30 ms, for example. Each vertical striation corresponds to a period of vocal fold vibration (one beat of sound energy). The irregular vertical lines between 200 and 250 ms indicate glottaliza- tion, irregular vocal fold vibration, between “buy” and “a.” If larger “windows” at less frequent intervals are used, the result is a narrow-band Spectrogram, as shown in Figure 7.20. This Spectrogram is of the exact same speech sample as in Figure 7.19. You should be able to pick out the same formant patterns. By taking a larger sample for each spec- trum, more accurate frequency information can be com- puted: in a narrow-band Spectrogram, individual harmonics can be distinguished.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.