One of the challenges brought on by the digital revolution of the recent decades is the mechanism by which information carried by texts can be extracted in order to access its contents.
The processing of named entities remains a very active area of research, which plays a central role in natural language processing technologies and their applications. Named entity recognition, a tool used in information extraction tasks, focuses on recognizing small pieces of information in order to extract information on a larger scale.
The authors use written text and examples in French and English to present the necessary elements for the readers to familiarize themselves with the main concepts related to named entities and to discover the problems associated with them, as well as the methods available in practice for solving these issues.
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weβve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere β even offline. Perfect for commutes or when youβre on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Named Entities for Computational Linguistics by Damien Nouvel,Maud Ehrmann,Sophie Rosset in PDF and/or ePUB format, as well as other popular books in Technology & Engineering & Electrical Engineering & Telecommunications. We have over one million books available in our catalogue for you to explore.
In this chapter, we examine what has given rise to the concept of a named entity (NE) and develop an overview of the extensive work on document analysis tasks. A remarkable aspect in the history of the NE concept is the set of conditions related to its appearance. As we will see, this concept has emerged as part of research programs initiated, funded and/or supported by the US Defense in the 1980s. These research programs have the overall objective to define a range of applications and to propose to research laboratories to work to resolve the problems raised by the defined applications. In general, these research laboratories then participate in an evaluation campaign organized by the research program. In the field of natural language processing, an evaluation campaign consists of comparing the performance of an automatic system against a human when faced with the same task with the same data. More precisely, this means that the responses generated by the system or systems (called hypotheses) are automatically compared with the answers provided by one or more human experts (called references). This comparison allows us to βrateβ the systems and rank them. If the evaluation campaigns are used to compare the performance of different systems, they also help to simulate the research and development of a specific problem.
In this chapter, we present a historical overview of the research programs β and their evaluation campaigns β during which the NE automatic processing tasks were defined, then refined, continuously developing over more than two decades. We will also see how, by doing so, entities became pivotal for other tasks related to natural language processing and knowledge acquisition.
1.1. Research program history
In the 1980s, the automatic understanding of documents has become a major objective in artificial intelligence. In particular, it was in 1987 that the first Message Understanding Conference conference was initiated by the Naval Research And Development (NRAD) division of the Naval Ocean Systems Center (NOSC) with support from the Defense Advanced Research Projects Agency (DARPA). In total, within this program, seven conferences were organized between 1987 and 1997. The purpose of this series of conferences was to organize evaluation campaigns of automatic understanding of documents. However, this task quickly became extremely complex and the proposals were made to distinguish the elementary building blocks. One of these proposals resulted in the definition of a task for detecting and categorizing NEs. The concept of a NE has been further developed and expanded, as the years have gone by and the systems have progressed. It is important to note that the first works were initiated as part of US research programs, and thus essentially concern English; nevertheless, research programs organized in different countries and in other languages have also been developed.
In the following sections, we present in chronological order the different evolutions of the NE concept and other concepts related to it.
1.1.1.Understanding documents: an ambitious task
In 1987, the first MUC (MUC-1) campaign was launched (see [GRI 97] for a history of campaigns). This first campaign was exploratory, the framework was deliberately made vague so as to allow each participant to make propositions and develop an experimental system. It was in 1989, during MUC-2, that the task of automatic document understanding was defined. The latter consisted of, according to events described in the texts, filling out a form with the correct information found in the documents. A document consisted of a telegram from the US Navy describing observations and naval battles in a refined form. For each event, a set of slots in the form had to be filled. For example, you had to find the event type, the agent, its date, its location. In total, there were 10 pieces of information per event to be found in the documents. The processed area was subsequently expanded and more types of documents were proposed. Table 1.1 taken from [GRI 97] illustrates the task and deals with the terrorist attack event type.
Table 1.1.An example of a document and an MUC-3 form, from [GRI 97]
19 March β A bomb went off this morning near a power tower in San Salvador leaving a large part of the city without energy, but no casualties have been reported. According to unofficial sources, the bomb β allegedly detonated by urban guerrilla commandos β blew up a power tower in the northwestern part of San Salvador at 0650 (1250 GMT).
Incident type Date Location Perpetrator Physical target Human target Effect on physical target Effect on human target Instrument
Bombing March 19 El Salvador: San Salvador (city) Urban guerrilla commandos Power tower β Destroyed No injury or death Bomb
As we can see in this example, understanding in this case corresponds to identifying and extracting pieces of information perceived as relevant (power tower and San Salvador), categorizing them (San Salvador is a city), identifying them (San Salvador is a city in Salvador) and possibly interpreting them (no casualties implies no injury or death). Over the campaigns and years, the task has been enriched and notably it was proposed in MUC-5 [MUC 93] to hierarchically structure the elements for integrating into the form.
The task of automatic understanding, as defined in these evaluation campaigns, ended up to be too complex given the capacities of computers and tehnological knowledge available at that time. However, it allowed us to highlight the important role of detecting important information in the documents. This resulted in the definition and implementation of a set of elementary building blocks for the task of understanding, including the detection of NEs.
1.1.2.Detecting basic elements: named entities
Understanding a document implies recognizing pieces of information that are relevant to the discussed subject and that play a role in the description of the event or fact. The main purpose of the task of detecting NEs is to identify these pieces of information. Its first definition implied recognition of NE in the documents, i.e. their identification (determining their boundaries) and categorization (determining their type, such as organization and location). This proposal was made for the MUC-6 [GRI 95] evaluation campaign. The concept of NE covers not only all the proper names of the categories person, organization and location, regrouped under the term Entity Name Expression (ENAMEX), but also the numerical expressions concerning the categories date, money and percentage, regrouped under the term Numeric Expression (NUMEX).
In practical terms, this implies that if the important elements are those that describe a person, place or organization, they are limited to proper names. Similarly, only the dates or times containing a number, the expressions indicating monetary amounts or percentages are also taken into consideration. Below, we use the text given in Table 1.1 and indicate the present NE, according to this definition:
<date> 19 March </date> β A bomb went off this morning near a power tower in <loc> San Salvador </loc> leaving a large part of the city without energy, but no casualties have been reported. According to unofficial sources, the bomb β allegedly detonated by urban guerrilla commandos β blew up a power tower in the northwestern part of <loc> San Salvador </loc> at <time> 0650 </time> (<time> 1250 </time> GMT).
The task defined here is sufficiently simple and especially well defined to allow a rapid improvement in the results obtained by different systems. This first definition leads to the first NE typology (see section 3.1 for a description of this typology). The research and development of this subject has experienced very significant growth and strong dynamism which not only allows us to develop novel approaches and techniques but als...