Building a Representative Theater Corpus
eBook - ePub

Building a Representative Theater Corpus

A Broader View of Nineteenth-Century French

Angus Grieve-Smith

Partager le livre
  1. English
  2. ePUB (adapté aux mobiles)
  3. Disponible sur iOS et Android
eBook - ePub

Building a Representative Theater Corpus

A Broader View of Nineteenth-Century French

Angus Grieve-Smith

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

The Digital Parisian Stage Project aims to compile a corpus of plays that are representative of performances in the theatersof Paris through history. This book surveysexisting corpora that cover the nineteenth century, lays out the issue of corpus representativeness in detail, and, using a random sample of plays from this period, presents two case studies of language in use in the Napoleonic era.It presents a compelling argument for the compilation and use of representative corpora in linguistic study, and will be of interest to those working in the fields of corpus linguistics, digital humanities, and history of the theater.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Building a Representative Theater Corpus est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Building a Representative Theater Corpus par Angus Grieve-Smith en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Lingue e linguistica et Scienze dell'informazione e biblioteconomia. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

© The Author(s) 2019
A. Grieve-SmithBuilding a Representative Theater Corpushttps://doi.org/10.1007/978-3-030-32402-5_1
Begin Abstract

1. Introduction

Angus Grieve-Smith1
(1)
The New School, New York, NY, USA
Angus Grieve-Smith

Abstract

The Digital Parisian Stage project has begun with a random sample of theatrical texts from 1800 to 1815. It is intended as a consistent, representative sample of the theatrical genre for studying language change and an improvement over unrepresentative corpora like FRANTEXT, with results that can be generalized to all of Parisian theater and possibly beyond. To determine how different the Digital Parisian Stage is from FRANTEXT, Grieve-Smith compared the theatrical texts in both corpora from 1800 to 1815 on two frequency measures: declarative sentence negation and left/right dislocation. He found significant differences between the two corpora in all measures. This suggests that the results many linguists have gotten from FRANTEXT may not be accurate, and that the Digital Parisian Stage will be valuable for revising those results.
Keywords
Corpus; Representative sample; French; Theater; Negation; Left dislocation; Right dislocation
End Abstract
The Digital Parisian Stage project aims to compile a corpus of plays that will be representative of performances in the theaters of Paris throughout history. The first section has been completed with a random sample of theatrical texts from the period 1800 through 1815, based on the list compiled by Charles Beaumont Wicks (1950), and retrieved from archives. This sampling technique goes beyond the “Principle of Authority ” used for the FRANTEXT corpus, to include playwrights and characters from a wider range of social backgrounds, giving a very different picture of the language. To confirm this broader representation I conducted studies of two groups of morphosyntactic features known to vary with social class : declarative sentence negation and dislocation constructions.
To begin with, a note on the concept of “corpus.” The word means “body” in Latin , and is typically used to refer to a group of texts that constitute a body in the sense of being a coherent whole, although sometimes this coherence can be more imagined and aspirational than real. Corpora have been created and used for hundreds of years, sometimes to study the work of a single author, sometimes of a group of authors, sometimes of a literary canon. The use of the phrases corpus iuris and corpus canonum to refer to a collection of legal texts dates back to the twelfth century (Van Hove 1908). Later corpora were used to group literary texts together for easy reference when writing criticism; Chambers (1728) notes: “We have also a corpus of the Greek poets; and another of the Latin poets.”
Linguists and lexicographers have been compiling digital corpora for analytical papers and dictionaries since the mid-twentieth century. After W. Nelson Francis and Henry Kučera created the Brown Corpus of English (1964), Houghton-Mifflin licensed it as the basis for the American Heritage Dictionary (1969). FRANTEXT was compiled in the 1960s with the goal of creating a new dictionary, the TrĂ©sor de la langue française (Imbs 1971). Since that time, other, larger corpora have been created, but FRANTEXT has been the reference corpus for historical studies of the French language.
These corpora are particularly useful for testing diachronic predictions from usage-based linguistic theories, such as the effect of type frequency on linguistic productivity (Bybee 1995). Imagine a French adolescent in the seventeenth century forgetting which negation construction typically goes with the verb cesser and picking the one that seems to go with lots of verbs. This is the type of mechanism assumed by the theory of type frequency.
In order to properly test these theories, however, we need something resembling the language of the past. We have no access to the spontaneous conversations of seventeenth-century adolescents, but instead we can imagine a playwright composing dialogue and verse, reaching into his or her memory for appropriate models. Some of these playwrights may have only looked to earlier playwrights, but others paid attention to the language they heard from their friends, from their servants and on the street.
We have many plays from the past, but we cannot analyze them all at the level we need to test these theories, and they are not all interchangeable. We need a representative sample.
FRANTEXT may have been appropriate for the construction of a dictionary marketed to “the cultivated man” (Imbs 1971: XVIII), but the Principle of Authority introduces a strong bias in favor of elite theater. The Digital Parisian Stage, based on a random sample of all plays that premiùred in Paris in the nineteenth century, aims to rectify that bias, offering a broader view of the language of this period that in turn produces more reliable studies of language change .
To compare the Digital Parisian Stage against FRANTEXT, I annotated 22 plays from the Digital Parisian Stage corpus for dislocation and negation features and compared them to the four plays chosen for the FRANTEXT corpus for this period, with striking results. In the Digital Parisian Stage plays, 74% of declarative sentence negations used ne 
 pas, while in the FRANTEXT plays it was only 50% (p < 0.001).
I chose negation constructions because the study I conducted for my dissertation, The Spread of Change in French Negation (Grieve-Smith 2009), investigated change in negation in the theatrical texts in FRANTEXT. In that study I found that the increase in frequency of ne 
 pas from the sixteenth through twentieth centuries fit the predictions of Kroch’s (1989) logistic model , and that the logistic model in turn could be explained by the theory of type frequency .
I also looked at left and right dislocation constructions, building on a study where I found a general increase in those constructions in the late twentieth century (Grieve-Smith 2000 ). For left dislocation constructions, 0.760% of non-interrogative sentences used the contrastive topic construction in the Digital Parisian Stage plays, compared to 0.238% in the FRANTEXT plays (p < 0.05, d = 1.04), and 0.113% of non-interrogative sentences (19) used the demonstrative left dislocation (CLD) construction, compared to just one token in FRANTEXT (0.00903%, d = 0.595). The difference in clitic right dislocations was extreme (0.420% for the Digital Parisian Stage plays but only 0.0918% in the FRANTEXT plays, p < 0.01, d = 1.13).
At least one of these differences can be shown to be due to the bias introduced by the Principle of Authority used to compile FRANTEXT. The Digital Parisian Stage corpus contains plays from 9 genres and 11 theaters, while the 4 plays from FRANTEXT are drawn from 3 genres and only 2 theaters, plus 1 closet play. The relative token frequency of ne alone is associated with both the genre of the play and the theater where it was performed, and ne 
 pas is associated with genre (one-way ANOVA, p < 0.05).
Several of the other negation and dislocation constructions displayed potential associations between corpus and theater, genre or some of the characteristics of the characters (age , gender and social class ). None of these were strong enough to rule out the possibility of sampling error given the size of the sample, but they suggest that once more data is collected for the Digital Parisian Stage corpus, either in the Napoleonic period or later in the nineteenth century, the possibility of sampling error for those factors may be within the typically acceptable range (α = 0.05).
These findings are specific to a short period, 1800–1815, and as such they do not have any immediate bearing on diachronic studies like my dissertation on negation (Grieve-Smith 2009) or my study on dislocation (Grieve-Smith 2000 ). They do suggest that when the 1% sample is complete for the entire nineteenth century, we will see patterns that are similar, but drawn from a more reliable sample that is likely to be closer to informal spontaneous conversation.
It is my hope that these findings will encourage more people to contribute to the Digital Parisian Stage project. For those whose area of focus does not include nineteenth-century Parisian French, I hope this report will encourage them to design and contribute to similarly representative projects in their areas. This will likely include the type of work contributed by Wicks (1950 et seq.), compiling records of language production into a comprehensive catalog that can serve as the sampling frame for a new corpus.
References
  1. American Heritage Dictionary. 1969. Boston: Houghton Mifflin.
  2. Bybee, Joan. 1995. Regular Morphology and the Lexicon. Language and Cognitive Processes 10: 425–455.Crossref
  3. Chambers, Ephraim. 1728. Corpus. In CyclopĂŠdia. London: Chambers.
  4. Francis, W. Nelson, and Henry Kučera. 1964. A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers. Providence: Brown.
  5. Grieve-Smith, Angus. 2000. Topicalization and Word Order in Conversational French. Southeastern Conference on Linguistics, Oxford, Mississippi.
  6. ———. 2009. ...

Table des matiĂšres