Studies in Corpus-Based Sociolinguistics
eBook - ePub

Studies in Corpus-Based Sociolinguistics

  1. 366 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Studies in Corpus-Based Sociolinguistics

About this book

Studies in Corpus-Based Sociolinguistics illustrates how sociolinguistic approaches and linguistic distributions from corpora can be effectively combined to produce meaningful studies of language use and language variation. Three major parts comprise the volume focusing on: (1) Corpora and the Study of Languages and Dialects, in particular, varieties of global Englishes; (2) Corpora and Social Demographics; and (3) Corpora and Register Characteristics. The 14 peer-reviewed, new, and original chapters explore language variation related to regional dialectology, gender, sexuality, age, race, 'nation,' workplace discourse, diachronic change, and social media and web registers. Invited contributors made use of systematically-designed general and specialized corpora, sound research questions, methodologies (e.g., keyword analysis, multi-dimensional analysis, clusters, and collocations), and logical/credible interpretive techniques. Studies in Corpus-Based Sociolinguistics is an important resource for researchers and graduate students in the fields of sociolinguistics, corpus linguistics, and applied linguistics.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

1
Corpus Approaches to Sociolinguistics

Introduction and Chapter Overviews
Eric Friginal and Mackenzie Bristow

Introduction

Sociolinguistics is the study of variation in language form and use that is associated with social, situational, attitudinal, temporal, and geographic influences (Friginal & Hardy, 2014). Many research studies in sociolinguistics have investigated why and how individuals across varying backgrounds speak and write differently. The characteristic features of spoken and written discourse have been described and interpreted, both on the micro (individual) and macro (group) levels, producing a wide range of diverse and overlapping, often fascinating results. Several studies have also explored the effects of various aspects of society, which includes societal expectations, cultural norms and traditions, and historical influences on the way language is used.
Friginal and Hardy’s (2014) Corpus-Based Sociolinguistics: A Guide for Students (Routledge) serves as a companion book for studies collected in this volume. In this collection, ā€œcorpus-based sociolinguisticsā€ is used as an umbrella term for empirical research studies of linguistic variation investigated using corpora and corpus tools. ā€œCorpus-basedā€ as a methodological approach is not differentiated from ā€œcorpus-drivenā€ or ā€œcorpus-assistedā€ā€”two related terms that are also used in some chapters of this volume. For Friginal and Hardy, the social, situational, attitudinal and relational, temporal, and geographic factors that underscore everyday linguistic variation are primarily broadly defined, especially in a comprehensive context of applied linguistic research. Sociolinguistics as a field of study may not necessarily focus on a definitive, singular cause of variation in speech and writing. In fact, an overlap between and among variables is primarily given importance in many sociolinguistic investigations of language use to further understand the unique interplay of factors that influence explicit and implicit linguistic variation. Personal, private, and cognitive individual factors can be examined, with the availability of data, alongside affiliations, role-relations, power, registers, and group dynamics. In quantitative sociolinguistics, these variables are measurable, and linguistic distributions and frequency data provide patterns and tendencies that are appropriate for qualitative interpretations. Overall, this methodological data-driven model has paved the way for the application of computational data extraction and text corpora in sociolinguistic studies. The merging of these approaches points to the important contribution of corpus linguistics in broader sociolinguistic research.
Corpus linguistics is a research approach that facilitates practical investigations of language variation and use, producing a range of reliable and generalizable linguistic data that can be extensively interpreted (Biber, Conrad, & Reppen, 1998). The cor pus approach follows methodological innovations that allow scholars to ask ā€˜new’ research questions on existing linguistic phenomena across many social situations. The findings these questions generate may produce information and perspectives on language variation that may either complement or reject assumptions from those taken in traditional sociolinguistic investigations. In addition, corpora can also provide a stronger argument for the view that language variation is systematic, yet fluid, and can be described using empirical, quantitative methods. This argument is important because sociolinguistic studies, following their deep roots in ethnographies and qualitative analyses, also require extensive technical, multifaceted data that help explain the interface between linguistic parameters existing within social groups.
Sociolinguistic studies explore two primary variables under investigation: (1) linguistic variables, which focus on the presence of variation in language use—from observable shifts or changes in how these linguistic forms are utilized in speaking and writing; and (2) societal variables, which include the social, situational, attitudinal and relational, temporal, and geographic influences and any combination of these influences that potentially account for these linguistic shifts or changes. Thus, it is important to know how these two groups of variables are defined or operationalized in many sociolinguistic studies. Friginal and Hardy (2014) briefly define a nd describe these linguistic and societal variables:

Primary Linguistic Variables Investigated in Sociolinguistics

  • Sounds, words, and grammatical features of a language, including a range of differences in the pronunciation of sounds, intonation of utterances, and the use of words and phrases (and also dysfluent markers of speech), and grammatical structures of language
  • Discoursal features, including spoken and written characteristics of style, formality/informality of discourse, and textual structures (e.g., use of cohesive devises in writing; interruption, latching, or overlaps in face-to-face conversation)
  • Pragmatic features, including spoken and written expressions of politeness in language, stance and hedging, the use of respect markers or cuss words, and features of agreements and disagreements in interactions
  • Specific communicative features, including spoken and written manifestations of friendliness, affection, loyalty, or disgust; various speech acts (e.g., requests, commands, and declarations); pauses, backchannels, and greetings and leave-takings; and visual representations of attitude, political positions, and personal/group opinions and biases in print media
  • Paralanguage features, including pitch and volume in speech and non-verbal elements of language such as silence, gasp, and laughter in conversations; paralanguage may also include the use of visuals (e.g., pictures, colors, signs, and signage), emoticons, or punctuation marks in writing

Primary Societal Variables Explored in Sociolinguistics

  • Social—speaker/writer demographic information such as gender and sexuality, age, occupation, educational background, annual income, group networks (traditionally, social networks—not referring to the internet or social media applications such as Facebook, Twitter, or Instagram), social class, or social status
  • Geographic—particular locations, geographic regions, and boundaries
  • Situational—various communication contexts and registers; speech events such as conversation, interview, or broadcast
  • Attitudinal and relational—speaker/writer perceptions and attitudes (including prejudice), identity and identity construction, power, relationships and roles, and solidarity
  • Temporal—time periods (e.g., ā€˜real time’ and ā€˜apparent time’ studies), changes in societal and cultural perspectives over time, major historical events including influences from wars, natural calamities, and migration patterns over time
  • Other societal variables—more specific personality and cognitive factors, sociological distinctions; ā€œuncommonā€ or new/emerging societal variables particularly influenced by the internet; human–non-human and machine-mediated communication; technology-based variables (e.g., use of telecommunication devices, gaming devices, and gaming culture)
By examining the role of these societal variables on how language is ā€˜formed’ and used, researchers can further illustrate, comprehend, and also deeply experience the reality that everyday language is remarkably varied and influenced by numerous factors. In sum, no one speaks the same way all the time, and individuals constantly exploit the nuances of the languages they speak and write for a wide variety of purposes. This recognition of variation in language use implies that everyone must see language as not just some kind of abstract object of study (Meyerhoff, 2011). Language is pragmatic, practical, evolving, and unique to individuals or groups of connected individuals. Concluding remarks and generalizations can be formalized about these variations and their practical implications, and, further, answers can be obtained from questions such as: How do these variations influence policies or attitudes? How could these patterns be taught in the classroom effectively? How do people address linguistic differences to make sure that their reactions are valid or constructive, especially as they try to define what is proper or correct language in contrast to improper or sub-standard language (does ā€œsub-standardā€ language exist in the first place)? How have linguistic patterns changed over time? These and many other related questions could be answered by utilizing multiple (traditional) research approaches in sociolinguistics and they are, arguably, best described and interpreted following a corpus linguistic research paradigm.

Sociolinguistics and Corpus Linguistics

The exploration of sociolinguistics using corpora and corpus tools is still a relatively new area of research compared to established ethnographic methods, emerging from variationist studies in the mid-1980s (e.g., especially from seminal works by Edward Finegan, Douglas Biber, and their contemporaries). As emphasized by Biber, Reppen, and Friginal (2010), corpus linguistics is not, in itself, a model of language (unlike sociolinguistics). This implies a potential misnomer in how the term corpus linguistics has been used and applied in many research studies over the years. What is clear is that corpus linguistics is primarily a methodological approach that can be defined or described according to the following considerations from Biber et al. (1998):
  • It is empirical, analyzing the actual patterns of use in natural texts.
  • It utilizes a large and principled collection of natural texts, known as a corpus (pl. corpora), as the basis for analysis.
  • It makes extensive use of computers for analysis, employing both automatic and interactive techniques.
  • It relies on the combination of quantitative and qualitative analytical techniques.
The descriptions may suggest that the corpus linguistic approach produces data and findings about variation in language that have much greater generalizability and validity than would otherwise be feasible and/or justifiable from other study designs. Research in corpus-based sociolinguistics, in general, may offer stronger support for the view that language variation is indeed systematic, with consistent patterns, and can be described using empirical, quantitative, and frequency-based methods (Biber, 1988). It is important to remember that, although corpora offer measurable descriptions of texts and social groups, the researcher and subsequent consumers of these studies must still interpret these corpus-based findings as accurately and consistently as possible. Extensive knowledge of the literature, related approaches, and awareness of the clear limitations of computational tools must always be in the foreground. Interpretive techniques honed by ethnographers and discourse analysts over the years are certainly invaluable. For example, as highlighted by Friginal and Hardy (2014), there is little importance in knowing that one gender group uses more passive voice constructions than another without being able to explore the functional reasons behind that difference in a particular context or communicative setting (e.g., in an academic or professional interaction; in telephone calls vs. face-to-face job interviews; or in narrative vs. expository texts). To summarize, corpus approaches can often be used in tandem with qualitative and discourse analytic methods, and corpora and frequency data can be statistically tested to figure out whether a consistent and significant pattern exists.

Sociolinguistic Research Questions, Corpus Design, and Corpus Representativeness

One of the key elements in Sinclair’s (2005) definition of a corpus is that the collection of texts is used to represent a language or language variety. In other words, corpora are created for the purpose of better understanding a particular type of language. Thus, a sample of texts that together can serve as a characteristic example of the target variety or target domain is needed. This description brings to light the concept of representativeness. Biber (1993) defines representativeness as ā€œthe extent to which a sample includes the full range of variability in a populationā€ (p. 243). In a more general sense beyond corpus linguistics, representativeness refers to the idea that one can collect a smaller sample than the population as a whole, but that that smaller sample could show as much variability in the subset as in the overall population. Because a corpus should represent a particular language or variety of that language, corpus designers must be aware of the kinds of questions they would like to answer or think that others who use their corpora might ask. According to Biber, the representativeness of a corpus can be considered both contextually and linguistically. Contextually, a corpus of the target language or variety should include the full range of various registers or text types used. In other words, because the different situations in which a language is used affect the way that language is actually utilized across contexts, those different registers need to be included in order to fully understand the variety as a whole. Linguistically, a corpus can be said to be representative if it includes the full range of different lexical and grammatical features present in that language or variety (Friginal & Hardy, 2014).
Researchers involved in the study of sociolinguistic variation have also developed models of corpus design that emphasize representativeness and generalizability of corpora. Corpora are generally not created without particular research questions in mind. Corpora are planned, collected, organized, and analyzed in ways that sociolinguists have thought of studying from the inception of the idea to create them. Although it has also been traditional for some researchers to utilize publicly available corpora, this methodology may limit the extent to which the researcher can be familiar with the data and its existing social contexts. It also narrows the focus of the types of subsequent questions that can be asked. For example, if someone were interested in differences in writing by men and women, the corpus being used would have had to include that variable to be separated. The same would be true for any variable commonly associated with sociolinguistic research (e.g., age, socioeconomic status, geographic location/dialect, register).
The merging of corpus and sociolinguistic approaches in the past few years has begun to address important corpus design, collection, and representativeness components. For example, spoken texts from sociolinguistic interviews have also been carefully developed to capture at least some of the lexico/syntactic features of speech for various demographic comparisons. A ā€œsociolinguistics corpusā€ collected by Tagliamonte (2006, 2008) was obtained from oral-narratives to capture vernacular language a nd annotated for speakers’ demographic characteristics. The Linguistic Innovators Corpus or LIC (Kerswill, Cheshire, Fox, & Torgersen, 2008) (see Chapter 7 of this volume) also utilized sociolinguistic interviews collected from 100 working-class adolescents (who were college students) and 18 elderly speakers in two English boroughs, Hackney and Havering. The LIC corpus has been used to test whether or not London is the center of linguistic innovation in southeastern England (i.e., a dialect study) (Gabrielatos, Torgersen, Hoffman, & Fox, 2010). Publicly available ā€œmega-corporaā€ such as Google Lab’s Google Books Ngram Viewer and Mark Davies’s COCA and COHA (Corpus of Contemporary American English and Corpus of Historical American English) and many others from his BYU corpus site (www.corpus.byu.edu) (see Chapters 2, 3, and 4 of this volume) provide information suitable for temporal (both synchronic and diachronic studies) and register studies, contributing to the increasing amount of research that can directly describe sociolinguistic variation and change.

Corpus Approaches to Sociolinguistics

Friginal and Hardy (2014) argue that, although corpus-based sociolinguistics h...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. CONTENTS
  5. List of Illustrations
  6. List of Contributors
  7. 1 Corpus Approaches to Sociolinguistics: Introduction and Chapter Overviews
  8. PART 1 Corpora and the Study of Languages/Dialects (Varieties of Global Englishes)
  9. PART 2 Corpora and Social Demographics
  10. PART 3 Corpora and Register Characteristics
  11. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Studies in Corpus-Based Sociolinguistics by Eric Friginal in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.