Coreference
eBook - ePub

Coreference

Annotation, Resolution and Evaluation in Polish

  1. 297 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Coreference

Annotation, Resolution and Evaluation in Polish

About this book

'Coreference' presents specificities of reference, anaphora and coreference in Polish, establish identity-of-reference annotation model and present methodology used to create the corpus of Polish general nominal coreference. Various resolution approaches are presented, followed by their evaluation. By discussing the subsequent steps of building a coreference-related component of the natural language processing toolset and offering deeper explanation of the decisions taken, this volume might also serve as a reference book on state-of the art methods of carrying out coreference projects for new languages and a tutorial for NLP practitioners.
Apart from serving as a description of the fi rst complete approach to annotation and resolution of direct nominal coreference for Polish, this book is a useful starting point for further work on other types of anaphora/coreference, semantic annotation, cognitive linguistics (related to the topic of near-identity, discussed in the book) etc. With extended tutorial-like sections on important subtopics, such as evaluation metrics for coreference resolution, it can prove useful to both researchers and practitioners interested in semantic description of Balto-Slavic languages and their processing, engineers developing language resources, tools and linguistic processing chains, as well as computational linguists in general.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Coreference by Maciej Ogrodniczuk,Katarzyna Glowinska,Mateusz Kopec,Agata Savary,Magdalena Zawislawska in PDF and/or ePUB format, as well as other popular books in Technology & Engineering & Writing & Presentation Skills. We have over one million books available in our catalogue for you to explore.

Part I: Introduction

dp n="16" folio="" ? dp n="17" folio="" ?
Magdalena ZawisƂawska

1 Reference, anaphora, coreference

1.1 The concept of reference

Coreference is usually defined as a phenomenon consisting in different expressions relating to the same referent. Therefore, this general definition requires us to first explain the basic concept of reference. Reference is, most of all, a subject of interest of logical semantics, but linguistics as well, and it is understood in very different ways, depending on the field.
Classical logical semantics adopts after Frege (1892) a distinction between two aspects of meaning of natural language units. Frege described them as Bedeutung and Sinn; in the writings of other scholars they are called, respectively: denotation and connotation by Mill (1843), denotation and meaning by Russell (1905), extension and intension by Carnap (1947), and reference and sense by Black (1949). In this perspective, reference is a relation to extra-language beings (referents), while sense is an inter-language relation – to other signs of this language system. This means that expressions can have the same reference, but a different sense, or conversely – the same sense, and a different reference, cf.:
  • (1.1) Zwycięzca spod Austerlitz jest przegranym spod Waterloo.
    ‘The winner from Austerlitz is the loser from Waterloo.’
  • (1.2) Jeden logik kƂóciƂ się z drugim logikiem o definicję referencji.
    ‘A logician argued with another logician about the definition of reference.’
In Example (1.1), both expressions zwycięzca spod Austerlitz ‘the winner from Austerlitz’ and przegrany spod Waterloo ‘the loser from Waterloo’ refer to the same referent –Napoleon Bonaparte, but they have a different sense – they distinguish different properties of Napoleon. In sentence (1.2), two isomorphic words logician have the same sense (name the same attributes), but different referents.
However, such an understanding of reference is mistaken according to Padučeva (1992). She points out that reference is not an element of word meaning, but is realised in an utterance – therefore, it is not a quality of lexemes, but of the uses representing their forms in the text, e.g.:
Reference is, generally speaking, a relation of individual and, at the same time, new objects and situations. For this reason, reference applies not to words or linguistic expressions, but to their text uses – it relates to utterances and their elements. [...] Therefore, the question “What is the reference of the word man?” would be nonsensical, just as of any other word in the dictionary or of any word connection (for example, a young man, a tall man), or a sentence constructed according to the rules of grammar. The sentence A man entered has no reference on its own –it is not related to any specific situation, or any object. (Padučeva, 1992, p. 12)
That is why, as the scholar notes, every syntactically bound element of a sentence has a meaning, but reference characterises only some elements of an utterance. Thus, it is necessary to distinguish between reference as an attribute of speech act (utterance) and lexeme denotation (extension), which would constitute a set of all potential referents. This means that a dictionary entry does not have reference, it only has denotation – a potential reference that can (but does not have to) be realised in the text.
Lexeme denotation is connected with a phenomenon which Padučeva calls denotative ambiguity. This means that one word can have semantic variants, and, depending on the context, realise a different reference type:, e.g.:
  • (1.3)
    1. PrzeczytaƂ caƂego Hemingwaya.
      ‘He read all Hemingway.’
    2. Hemingway urodziƂ się w 1899 r.
      ‘Hemingway was born in 1899.’
  • (1.4)
    1. MieszkaƂ w Berlinie.
      ‘He lived in Berlin.’
    2. Berlin naciska na bankructwo Grecji.
      ‘Berlin insists on the bankruptcy of Greece.’
  • (1.5)
    1. ZbiƂa nowy talerz.
      ‘She broke a new plate.’
    2. Zupa byƂa tak dobra, ĆŒe zjadƂ caƂy talerz.
      ‘The soup was so good that she ate the whole plate.’
Example (1.3.a) describes the writer’s legacy, and (1.3.b) the writer himself. Example (1.4.a) refers to a city, and (1.4.b) to the German government, whose seat is in Berlin. Sentence (1.5.a) is about a vessel, and (1.5.b) about its contents. Semantic variability of this type is very system-wise and often noted in dictionaries; however, it makes it very difficult to annotate coreferential expressions in a corpus.
Moving reference to the utterance level and treating it independently of the lexeme meaning is an essential matter when creating a corpus of coreferential expressions. This is because one cannot treat every recurring word in the text as coreference, e.g.:
  • (1.6) KaĆŒdy szanujący się poseƂ ma asystenta. Asystentami są z reguƂy ludzie mƂodzi, ale nie brakuje rĂłwnieĆŒ szczerze zaangaĆŒowanych emerytĂłw. Poglądy polityczne asystenta powinny być zbieĆŒne z linią szefa. Pracują jako wolontariusze tak jak Marek Hajbos, asystent Zyty Gilowskiej. PoseƂ Adam Bielan (rzecznik PiS) na przykƂad pƂaci asystentom za wysyƂanie korespondencji. Obecny minister spraw-iedliwoƛci Grzegorz Kurczuk zaczynaƂ partyjną dziaƂalnoƛć jako asystent Izabelli Sierakowskiej. W ministry poszedƂ teĆŒ byƂy asystent JĂłzefa Oleksego Lech Nikolski. PosƂowie nie poprzestają na jednym asystencie.
    ‘Every MP with some self-respect has an assistant. Assistants are usually young people, but one can also find many sincerely committed pensioners. The political views of the assistants should mirror the line of their supervisors. They work as volunteers just like Marek Hajbos, the assistant of Zyta Gilowska. For example, MP Adam Bielan (the speaker of Law and Justice) pays his assistants for sending correspondence. The presiding Minister of Justice Grzegorz Kurczuk began his party activity as an assistant of Izabella Sierakowska. Similarly, the former assistant of Józef Oleksy, Lech Nikolski, joined the ministry force. MPs do not stop at one assistant.’
In the text (1.6), the author used the word assistant eight times. However, it does not have the same reference. We can see references to the class (‘Every MP with some self-respect has an assistant’; ‘The political views of the assistants should mirror the line of their supervisors’; ‘MPs do not stop at one assistant.’), to the part of this class (‘Assistants are usually young people, but one can also find many sincerely committed pensioners.’; ‘Adam Bielan’s assistants’), and to specific people (‘Marek Hajbos, Zyta Gilowska’s assistant’; ‘assistant of Józef Oleksy, Lech Nikolski’). Including all these expressions into one coreferential cluster would be a mistake.

1.2 The typology of reference

Another exceptionally important issue when studying coreference concerns the question of which elements of the utterance can be characterised by reference. In a classic take, reference can only be a quality of nominal phrases that name objects. This view is expressed by TopoliƄska (1984, p. 302), cf.:
The names of objects of material nominal groups encountered in texts are characterised by referential properties, that is, a relation to an object that they name.
Padučeva’s standpoint is not entirely clear, as she writes at one point (Padučeva, 1992, p. 113) that reference is not a property of predicates (and NPs used in predicative function), because they name attributes of the object instead of the objects themselves; at other point, on the other hand, she speaks of two types of reference: object and non-object, cf.:
In the tradition of logic since Frege, it has been assumed that object terms of reference [...] and propositions should be treated uniformly. Propositions – just as terms – have referents (the referent of a proposition is its truth value). In the papers dealing with pragmatics [...] reference is understood primarily as a property of object terms. For linguistics, it is more natural to treat object terms reference and propositions in the same way: assigning a language utterance to reality is performed not only through object term reference, but also through reference of components with propositional meaning, which can refer to facts, events, and situations. (Padučeva, 1992, p. 15).
According to Padučeva, reference is an attribute of a whole sentence used in an utterance, of its propositional elements and nominal groups. The author does not see words as non-object reference means (that is, references to events and situations), but instead she sees those means in grammatical categories: tense and mood.
A different solution of this matter can be found in (Vater, 2009, p. 104), who distinguishes four types of reference:
  1. situational reference, which is a superordinate reference type, because sentences refer to a certain event, state or action
  2. temporal reference, which relates to all time relations between situations
  3. spatial reference, which can refer to the location of a given object in space or moving direction
  4. object reference.
As we can see, the choice of reference definition significantly influences the creation of coreferential chains in the text. As a rule, reference in the papers in the field of NLP is understood classically – as object reference exclusively. It seems, however, that it is a solution that is effective only in the starting phase of work on the matter of text coreference. In order to fully describe this phenomenon, one should include all types of reference named by Vater, e.g.:
  • (1.7) ZapadaƂ mrok. Bardzo ich to zaniepokoiƂo.
    ‘It was getting dark. They grew very anxious because of that.’
  • (1.8) Spotkali się w zeszƂym roku. Wtedy wƂaƛnie się w sobie zakochali.
    ‘They met last year. It was then that they fell in love.’
  • (1.9) MieszkaƂ w samym centrum. ByƂo tam doƛć gƂoƛno.
    ‘He lived in the very centre. It was quite loud there.’
In Example (1.7), the pronoun to ‘that’ which refers to the whole predication ZapadaƂ mrok. ‘It was getting dark.’ From the point of view of the classical reference theory, one cannot match in one cluster coreferential expressions of this sentence and the pronoun to ‘that’. Analogically, adverbials w zeszƂym roku ‘last year’, wtedy ‘then’, w samym centrum ‘in the very centre’, and tam ‘there’ will not be seen as coreferential, but if we adopt Vater’s view that reference is not restricted to object reference, we will have to include these examples in annotation as well.
Even if we define reference narrowly (as object reference only), we need to take into account its different types. In general, in the papers on logical semantics reference has been understood as merely a relation to a specific, distinguished object. However, linguistics sees object reference also as a relation to a set of objects, which the addressor does not want, or cannot identify.
Padučeva (1992, p. 118–126) distinguishes three types of nominal reference groups:
  1. defined nominal groups (with a single object or a set of objects), e.g. Ernest Hemingway urodziƂ się w 1899 r. ‘Ernest Hemingway was born ...

Table of contents

  1. Coreference
  2. Title Page
  3. Copyright Page
  4. Table of Contents
  5. Preface
  6. Part I: Introduction
  7. Part II: Coreference Annotation
  8. Part III: Coreference Resolution
  9. Part IV: Evaluation
  10. Part V: Summary
  11. Acknowledgements
  12. Bibliography