Part 1: Basic Concepts, Representations and Feature Extraction
Outline
Introduction
Getting Familiar with Audio Signals
Signal Transforms and Filtering Essentials
Audio Features
1
Introduction
Abstract
This chapter has an introductory purpose. A chapter outline is provided, along with general notes on the bookās exercises and the companion software. Before we proceed, it is important to note that, although in this book the term audio does not exclude the speech signal, we are not focusing on traditional speech-related problems that have been studied by the research community for decades, e.g., speech recognition and coding.
Keywords
Audio analysis
MATLAB
During recent years we have witnessed the increasing availability of audio content via numerous distribution channels both for commercial and non-profit purposes. The resulting wealth of data has inevitably highlighted the need for systems that are capable of analyzing the audio content in order to extract useful knowledge that can be consumed by users or subsequently exploited by other processing systems.
Before we proceed, it is important to note that, although in this book the term āaudioā does not exclude the speech signal, we are not focusing on traditional speech-related problems that have been studied by the research community for decades, e.g. speech recognition and coding. It is our intention to provide analysis methods that can be used to study various audio modalities and their relationships in mixed audio streams. Consider, for example, the task of segmenting a radio broadcast into homogeneous parts that contain either speech, music, or silence. The development of a solution for such a task demands that we are familiar with various audio modalities and how they affect the performance of segmentation algorithms in audio streams. In other words, we are not interested in providing solutions that are well tailored to specific audio types (e.g. the speech signal) but are not applicable to other modalities.
As with several other types of media, the automatic analysis of audio signals has been gaining increasing interest during the past decade. Depending on the storage/distribution format, the respective audio content classes, the co-existence of other media types (e.g. moving image), the user requirements, the data volume, the application context, and numerous other parameters, a diversity of applications and research trends have emerged to deal with various audio analysis tasks. The following list includes both speech and non-speech tasks so as to provide a general idea of the trends in several popular areas of speech/audio processing:
ā¢ Speech recognition: this is the task of ātranslatingā a speech signal to text using computational tools. Speech recognition is the oldest domain of audio analysis, but it is beyond the purpose of this book to provide a detailed study on speech recognition. We only present generic dynamic time warping and temporal modeling techniques that can also be applied on other audio signals.
ā¢ Speaker identification, verification and diarization: These speaker-related tasks focus on designing methods that discriminate between different speakers. Speaker identification and verification can be useful in the development of secure systems and speaker diarization, being able to answer the question āwho spoke when?ā, can be used in conversation summarization systems.
ā¢ Music information retrieval (MIR): due to the huge increase in the amount of available digital music data during the past few years, there has been an increasing need for the automatic analysis of this type of data. MIR focuses on automatically extracting information from the music signal for the purposes of content tagging, intelligent indexing; retrieval; browsing of music tracks; recommendation of new tracks based on music content (possibly combined with user preferences and collaborative knowledge); segmentation of music tracks, generation of summaries; extraction of automated music transcriptions, etc.
ā¢ Audio event detection: this is the tas...