Formalizing Natural Languages
eBook - ePub

Formalizing Natural Languages

The NooJ Approach

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Formalizing Natural Languages

The NooJ Approach

About this book

This book is at the very heart of linguistics. It provides the theoretical and methodological framework needed to create a successful linguistic project.

Potential applications of descriptive linguistics include spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, and more. These applications have considerable economic potential, and it is therefore important for linguists to make use of these technologies and to be able to contribute to them.

The author provides linguists with tools to help them formalize natural languages and aid in the building of software able to automatically process texts written in natural language (Natural Language Processing, or NLP).

Computers are a vital tool for this, as characterizing a phenomenon using mathematical rules leads to its formalization. NooJ – a linguistic development environment software developed by the author – is described and practically applied to examples of NLP.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere β€” even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Formalizing Natural Languages by Max Silberztein in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

1
Introduction: the Project

The project described in this book is at the very heart of linguistics; its goal is to describe, exhaustively and with absolute precision, all the sentences of a language likely to appear in written texts1. This project fulfills two needs: it provides linguists with tools to help them describe languages exhaustively (linguistics), and it aids in the building of software able to automatically process texts written in natural language (natural language processing, or NLP).
A linguistic project2 needs to have a theoretical and methodological framework (how to describe this or that linguistic phenomenon; how to organize the different levels of description); formal tools (how to write each description); development tools to test and manage each description; and engineering tools to be used in sharing, accumulating, and maintaining large quantities of linguistic resources.
There are many potential applications of descriptive linguistics for NLP: spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, etc. These applications have the potential for considerable economic usefulness, and it is therefore important for linguists to make use of these technologies and to be able to contribute to them.
For now, we must reduce the overall linguistic project of describing all phenomena related to the use of language, to a much more modest project: here, we will confine ourselves to seeking to describe the set of all of the sentences that may be written or read in natural-language texts. The goal, then, is simply to design a system capable of distinguishing between the two sequences below:
  1. a) Joe is eating an apple
  2. b) Joe eating apple is an
Sequence (a) is a grammatical sentence, while sequence (b) is not.
This project constitutes the mandatory foundation for any more ambitious linguistic projects. Indeed it would be fruitless to attempt to formalize text styles (stylistics), the evolution of a language across the centuries (etymology), variations in a language according to social class (sociolinguistics), cognitive phenomena involved in the learning or understanding of a language (psycholinguistics), etc. without a model, even a rudimentary one, capable of characterizing sentences.
If the number of sentences were finite – that is, if there were a maximum number of sentences in a language – we would be able to list them all and arrange them in a database. To check whether an arbitrary sequence of words is a sentence, all we would have to do is consult this database: it is a sentence if it is in the database, and otherwise it is not. Unfortunately, there are an infinite number of sentences in a natural language. To convince ourselves of this, let us resort to a redictio ad absurdum: imagine for a moment that there are n sentences in English.
Based on this finite number n of initial sentences, we can construct a second set of sentences by putting the sequence Lea thinks that, for example, before each of the initial sentences:
Joe is sleeping β†’ Lea thinks that Joe is sleeping
The party is over β†’ Lea thinks that the party is over
Using this simple mechanism, we have just doubled the number of sentences, as shown in the figure below.
image
Figure 1.1. The number of any set of sentences can be doubled
This mechanism can be generalized by using verbs other than the verb to think; for example:
Lea (believes | claims | dreams | knows | realizes | thinks | …) that Sentence.
There are several hundred verbs that could be used here. Likewise, we could replace Lea with several thousand human nouns:
(The CEO | The employee | The neighbor | The teacher | …) thinks that Sentence.
Whatever the size n of an initial set of sentences, we can thus construct n Γ— 100 Γ— 1,000 sentences simply by inserting before each of the initial sentences, sequences such as Lea thinks that, Their teacher claimed that, My neighbor declared that, etc.
Language has other mechanisms that can be used to expand a set of sentences exponentially. For example, based on n initial sentences, we can construct n Γ— n sentences by combining all of these sentences in pairs and inserting the word and between them. For example:
It is raining + Joe is sleeping β†’It is raining and Joe is sleeping
This mechanism can also be generalized by using several hundred connectors; for example:
It is raining (but | nevertheless | therefore | where | while |…) Joe is sleeping.
These two mechanisms (linking of sentences and use of connectors) can be used multiple times in a row, as in the following:
Lea claims that Joe hoped that Ida was sleeping. It was raining while Lea was sleeping, however Ida is now waiting, but the weather should clear up as soon as night falls.
Thus these mechanisms are said to be recursive; the number of sentences that can be constructed with recursive mechanisms is infinite. Therefore it would be impossible to define all of these sentences in extenso. Another way must be found to characterize the set of sentences.

1.1. Characterizing a set of infinite size

Mathematicians have known for a long time how to define sets of infinite size. For example, the two rules below can be used to define the set of all natural numbers
n.webp
:
(a) Each of the ten elements of set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} is a natural number;
(b) any word that can be written as xy is a natural number if and only if its two constituents x and y are natural numbers.
These two rules constitute a formal definition of all natural numbers. They make it possible to distinguish natural numbers from any other object (decimal numbers or others). For example:
  • – Is the word β€œ123” a natural number? Thanks to rule (a), we know that β€œ1” and β€œ2” are nat...

Table of contents

  1. Cover
  2. Table of Contents
  3. Dedication
  4. Title
  5. Copyright
  6. Acknowledgments
  7. 1 Introduction: the Project
  8. PART 1: Linguistic Units
  9. PART 2: Languages, Grammars and Machines
  10. PART 3: Automatic Linguistic Parsing
  11. Conclusion
  12. Bibliography
  13. Index
  14. End User License Agreement