1.1. What is respeaking?
In general terms, respeaking may be defined as the production of subtitles by means of speech recognition. Yet, this is too broad to give an accurate idea of what respeaking entails. The problem here is that any attempt to provide a definition of respeaking is likely to be either too simplistic (falling short of accounting for its features and variations) or too cumbersome (trying to grasp its full scope). Be that as it may, this book needs a working definition as a starting point, so at the risk of falling into either trap, here is one:
A technique in which a respeaker listens to the original sound of a live programme or event and respeaks it, including punctuation marks and some specific features for the deaf and hard of hearing audience, to a speech recognition software, which turns the recognized utterances into subtitles displayed on the screen with the shortest possible delay.
The terms that are underlined in this definition need further explanation, as they may only be telling part of the story:
⢠Live: respeaking is also being used nowadays for pre-recorded subtitling, due to the fast throughput it produces.
⢠Programme: see āsubtitlesā below.
⢠Respeak: depending on the case, this verb could mean to repeat, to rephrase or even to translate from one language to another. To start with, respeaking is mostly carried out intralingually. Respeakers are often encouraged to repeat the original soundtrack so as to produce verbatim subtitles. Yet, the high speech rate of the source text often makes it impossible for respeakers to follow the original soundtrack literally. This means they have to edit it, thus rephrasing it rather than repeating it. Finally, respeakers in Red Bee Media Wales or VTM (Flanders), to name but two examples, respeak interlingually from Welsh into English and from English into Dutch respectively.
⢠Features for the deaf and hard of hearing audience: the amount of extra information provided for deaf and hard of hearing viewers depends on many factors such as the channel, the programme, the respeaker, the time available, etc. Most respeakers introduce information to identify the different speakers and sometimes other extra-linguistic elements such as clapping, booing or laughing.
⢠Speech recognition software: respeaking usually involves two types of software. Firstly, there is a speech recognition (SR) application that recognizes the respeakerās utterances and can display them, for example, on an ordinary text application such as Microsoft Word. Then, this speech recognizer is integrated into a subtitling application that shows the recognized utterances as subtitles on the screen.
⢠Subtitles: as will be explained in Chapter 9, respeaking is not only used to subtitle programmes on TV but also to provide speech-to-text-based accessibility (real-time transcription) in live events held in different venues such as museums, theatres, conferences and even churches. In these cases, for example in a gallery talk, the screen may not display images, but only the respoken utterances, which are then not exactly subtitles.
⢠Minimum delay: the delay may vary greatly depending on a number of factors including the software, the correction method or the subtitling mode. The delay in Windows Speech Recognition (WSR) is longer than in Dragon or ViaVoice; the correction method used by the French broadcasters TF1 and France 2, involving two people, causes longer delay than in other channels where respeakers correct their own mistakes; finally, when respeaking is used for pre-recorded subtitling and thus not intended for a live audience, a longer delay is not a problem at all.
1.2. The name game
One of the consequences of the very little research carried out so far in respeaking is the lack of established terminology to refer not only to the professionals engaged in this discipline but also to the discipline itself. As far as the English language is concerned, a quick look at some of the publications available yields several long and precise labels such as speech-based live subtitling (Lambourne et al. 2004), (real time) speech recognition-based subtitling and real-time subtitling via speech recognition (Eugeni 2008). Shorter alternatives such as speech captioning (see section 9.2) or shadow speaking (Boulianne et al. 2009) may be found in Australia and Canada, while in the USA, revoicing (Muzii 2006), voice-writing (Vincent 2007) and realtime voice writing (Keyes 2007) refer to the use of SR not only to produce live subtitles but also transcriptions in trials, classes and different types of public events. For all these alternatives, it seems that the term respeaking is rapidly consolidating both in the industry (Marsh 2006) and in academia (van der Veer 2007, Romero-Fresco 2008). As a matter of fact, in the same way that the term audio-visual translation has become a household name in Translation Studies and no longer seems to need a hyphen, re-speaking (Lambourne et al. 2004) has lost its hyphen as it has gained visibility.
Other languages present a different situation, the respeaking technique having consolidated much earlier than the terminology. As a result, there is a significant lack of consistency to refer to what has sometimes been branded as a ātĆ¢che sans nomā (Moussadek 2008), that is, a trade without a name.
In French, for example, there are long terms such as sous-titrage en direct via le respeaking (Imhauser 2007) or sous-titrage pour sourds et malentendants en direct ou en temps rƩel, used by the subtitling company Subbabel in their translated website. Yet, the most common terms have so far been respeaking, used in the French-speaking Swiss channels TSR1 and TSR2, sous-titrage vocal, used in Red Bee Media France, and la technique du perroquet, used in TF1 and France 2. In the first two options, the professional is referred to as respeaker and sous-titreur vocal respectively. The third option refers to a slightly different approach, where a perroquet (parrot) or rƩdacteur oral does the respeaking, a souffleur (whisperer) suggests possible corrections and a correcteur implements changes and has the final say over what will be displayed on the screen.
In other European languages such as German, the calque respeaking seems to have prevailed so far, although in this case both Re-speaking and Re-speaker are written with initial upper case as per German spelling rules. In Italian, Eugeni (2006) proposed the term rispeakeraggio in an attempt to āadapt as much as possible to the morpho-syntactic rules of the Italian grammarā while avoiding both āyet another integral loanā and āambiguous labels that may already be used to refer to similar or more generic techniques, such as repetition or reformulationā. However, the direct calque respeaking has so far proved more common in the industry than rispeakeraggio. This discrepancy between academia and industry is also applicable to the Dutch language, particularly in Flanders, the Dutch-speaking part of Belgium, where respeaken is found in academic publications (van der Veer 2007) while TV channels and subtitling companies seem to have opted for the more general live-ondertiteling.
In the case of Spanish, researchers, professionals and the official institution responsible for regulating the Spanish language (Real Academia EspaƱola) seem to have agreed on a specific term (rehablado) for respeaking following the discussion and proposals presented in Romero-Fresco (2008). It may be useful, particularly for those languages where no consistent terminology has been coined, to go over the criteria used in this article, namely brevity, flexibility, naturalness and specificity.
⢠Brevity: whereas long and expository labels such as real-time subtitling via speech recognition may be good by way of introduction, they are not really functional. If a similar term is chosen in a language other than English, the need to find a shorter alternative may lead users to opt for the calque respeaking, which is already available.
⢠Flexibility: a well as the length, a key issue to make a name functional is the possibility of declining it into an adjective, a noun for the professional doing the respeaking and a verb. This is difficult if foreign languages adopt long terms or the calque respeaking, which would require the use of respeaker, respoke, respoken, etc.
⢠Naturalness: the term chosen may be a foreign form (respeaking in any language other than English), an adaptation (rehablado in Spanish) or a natural form (reformulació simultà nia in Catalan, as proposed by Termcat, a centre for the development of terminology in the Catalan language).
⢠Specificity: the term chosen may be specific (in this case a new term for a new reality) or generic, such as simultaneous reformulation, which could be or has been used for something else.
⢠Transparency: the term chosen may be more or less self-explanatory. Speech recognition-based live subtitling is quite transparent, although also long and non-functional. Respeaking, in contrast, is more opaque.
In Spanish, some of the options considered were:
*Subtitulación (en directo) por reconocimiento de habla
([Live] subtitling by speech recognition):
long / not flexible / natural / specific / transparent
*Respeaking: short / not flexible / foreign / specific / opaque
*Subtitulación interpretada
(Interpreted subtitling):
±short / not flexible / natural / specific / ±opaque
*Interpretación (simultÔnea) subtitulada
(Subtitled / Print interpreting)...