Languages & Linguistics

Natural Language Processing

Natural Language Processing (NLP) is a field of computer science and artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It involves the development of algorithms and models to process and analyze large amounts of natural language data, with applications in machine translation, sentiment analysis, chatbots, and more.

Written by Perlego with AI-assistance

11 Key excerpts on "Natural Language Processing"

  • Book cover image for: Deep Learning
    eBook - PDF

    Deep Learning

    From Big Data to Artificial Intelligence with R

    • Stephane Tuffery, Stephane S. Tuffery(Authors)
    • 2022(Publication Date)
    • Wiley
      (Publisher)
    111 4 Natural Language Processing Natural Language Processing is the application of linguistics, statistical analysis, machine learning, and deep learning to create algorithms capable of analysing and interpreting text or spoken words in natural human language, in order to solve particular problems or perform specific tasks, or eventually to provide results or answers in natural language. It began in the early twentieth century with the work on lexical statistics of Jean-Baptiste Estoup and George Kingsley Zipf, which was followed by researchers, such as Benoît Mandelbrot, George Udny Yule, Pierre Guiraud, Charles Muller, Gustav Herdan, Maurice Tournier, Pierre Lafon, Etienne Brunet, André Salem, and Ludovic Lebart. Text data analysis developed a lot in the 1960s and 1970s and then gave birth to text mining, at the time when data mining commenced. In recent years, text mining has become a branch of Natural Language Processing (NLP). The latter goes beyond the framework of text mining because it is interested in writing but also in speech. Speech, of course, poses specific and complex problems. Natural Language Processing is a vast discipline, which also includes the inverse operations of the previous ones: Natural Language Generation and Text-to-Speech. In this chapter, we will focus on the analysis of textual data, illustrated by examples treated with the software R, which will be used to manipulate texts and strings, to transform them by word embedding methods and to analyse them by supervised and unsupervised statistical and machine learning methods. We will also discuss methods of topic modeling and sentiment analysis. 4.1 From Lexical Statistics to Natural Language Processing The first analyses of word frequencies can be found in the work of the psychologist Benjamin Bourdon, 1 but these analyses really developed with the work of Estoup 2 and Zipf.
  • Book cover image for: Adaptive Technologies for Training and Education
    Part IV Natural Language Processing FOR TRAINING 247 Introduction Speech and language processing (also known as Natural Language Processing [NLP], human language technologies [HLT], computa- tional linguistics [CL], etc.) has a fifty-year history as a scientific discipline and a much more recent history as a commercial pres- ence. NLP-based research and applications focused on training environments have fol- lowed a similar trajectory. Initial NLP-based work in adaptive training focused on assess- ing student essay-length and short-answer texts, as well as on developing conversational intelligent tutoring systems. More recently, advances in speech recognition and synthe- sis have made it possible to include spoken language technologies in both assessments and tutorial dialogue systems. Progress in these traditional areas continues to improve with new innovations in NLP techniques, which include methods ranging from deep linguistic analysis to robust data-driven sta- tistical methods. Commercial products are making their way to the marketplace, and NLP-based systems are being deployed in authentic settings. Continuing software and hardware advances, as well as the amount of language data available on the World Wide Web, have also resulted in the creation of new educational applications of NLP – for example, linguistically detecting then adapting to student metacognitive and emo- tional states, personalizing texts and spoken artifacts to the needs and interests of par- ticular students, applications automatically generating test questions, and incorporating language into virtual training environments. This chapter first briefly reviews both the research area of NLP and the types of train- ing applications that NLP has contributed to. Next, a case study is presented to illustrate several ways that NLP techniques have been used to develop a spoken-dialogue physics training system that detects and adapts to student uncertainty.
  • Book cover image for: Redefining Libraries in Digital Era
    Chapter 9 Natural Language Processing in Web Searching Saikat Goswami 1 and Sumana Chakraborty 2 1 Assistant Librarian, Eastern Institute for Integrated Learning in Management (EIILM) 2 Librarian, Guru Nanak Institute of Pharmaceutical Science and Technology (GNIPST) INTRODUCTION The information era has brought us vast amounts of digitized text that is generated, propagated, exchanged, stored, and accessed through the Internet every day all over the world. Users demand useful and reliable information from the web in the shortest time possible, but there exist many obstacles to fulfilling this demand. It is becoming increasingly difficult for users to identify useful information from the thousands of results returned by search engines. The field of Natural Language Processing (NLP) is essential to helping improve the accuracy of search engine results. Almost all information on the web is provided in the form of natural language text. In order to provide better search results, we need to develop practical NLP technologies to extract the key information from the web text. Also, if web content is written in a language the user doesn’t speak, it isn’t accessible to the user no matter how good the search results are. Developing high quality machine translation (MT) systems to support query This ebook is exclusively for this university only. Cannot be resold/distributed. and webpage translations is important to improving successful information acquisition via the web. Definition Natural Language Processing (NLP) is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
  • Book cover image for: Artificial Intelligence and Data Analytics for Energy Exploration and Production
    • Fred Aminzadeh, Cenk Temizel, Yasin Hajizadeh(Authors)
    • 2022(Publication Date)
    • Wiley-Scrivener
      (Publisher)
    167 Fred Aminzadeh, Cenk Temizel, and Yasin Hajizadeh. Artificial Intelligence and Data Analytics for Energy Exploration and Production, (167–194) © 2022 Scrivener Publishing LLC 6 Natural Language Processing 6.1 Introduction Natural Language Processing (NLP) is a section of artificial intelligence (AI). It provides computers the capability to interpret, manipulate, understand, and drive meaning from human languages. NLP is a multidisciplinary field, and it draws mainly from the interaction between human language and data science. While trying to become a bridge between human com- munication and a computer’s reasoning, NLP uses the tools developed by computational linguistics and computer science. This chapter covers topics like sentence segmentation, tokenization, speech prediction, lemmatiza- tion, dependency parsing, named entity recognition, coreference resolu- tion, and uses cases and applications of NLP in the oil and gas Industry. With the increase in both data availability and computational power, the use of NLP techniques is expanding as never seen before. The life-changing effects of its usage can already be seen in human resources, finance, media, and healthcare, among other sectors. Currently, usage of NLP tools can be seen in many aspects of our daily lives, for example, the autocorrect tool influences our writing habits when using cellphones and word processors. In particular, the oil and gas industry also utilizes NLP in many aspects of its operations. By virtue of its analog nature, transforming human speech to a form in which a computer could understand and even reply coherently to, is not an easy task. The main reason for its difficulty lies in the way humans form their language rules. Some of these rules could be high-leveled and abstract, like the ones observed in irony or sarcasm. Meanwhile, others might be as simple and low-level as adding an “s” at the end of a word to signify its plurality.
  • Book cover image for: New Language Technologies and Research in Linguistics
    Natural Language Processing AND CORPORA 6 CHAPTER 6.1. INTRODUCTION Common language handling (NLP) is a field of software engineering, computerized reasoning and computational phonetics worried about the collaborations amongst PCs and human (characteristic) languages, and, specifically, worried about programming PCs to productively process expansive regular language corpora. Difficulties in regular language handling every now and again include common language understanding, characteristic language age (oftentimes from formal, machine-comprehensible legitimate structures), associating language and machine observation, discourse frameworks, or some mix thereof. 6.2. ORIGIN The verifiable scenery of NLP generally started in the 1950s, in spite of the way that work can be found before periods. In 1950, Alan Turing circulated an article titled “Preparing Machinery and Intelligence” which proposed New Language Technologies and Research in Linguistics 116 what is by and by called the Turing test as an establishment of information. The Georgetown investigate in 1954 included a totally customized understanding of more than sixty Russian sentences into English. The makers attested that inside three or five years, machine understanding would be a comprehended issue. Nonetheless, bona fide progress was much slower, and after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the wants, sponsoring for machine understanding was definitely lessened. Insignificant further research in machine elucidation was coordinated until the late 1980s, when the fundamental true machine understanding systems were created. Some very productive NLP systems made in the 1960s were SHRDLU, a trademark language structure working in restricted “pieces universes” with constrained vocabularies, and ELIZA, an amusement of a Rogerian psychotherapist, made by Joseph Weizenbaum in the region of 1964 and 1966.
  • Book cover image for: The Handbook of Artificial Intelligence
    • Avron Barr, Edward A. Feigenbaum, Avron Barr, Edward A. Feigenbaum(Authors)
    • 2014(Publication Date)
    If computers could understand what people mean when people type (or speak) English sentences, the systems would be easier to use and would fit more naturally into people's lives. Furthermore, AI researchers hope that learning how to build computers that can communicate as people do will extend our understanding of the nature of language and of the mind. So far, programs have been written that are quite successful at pro-cessing somewhat constrained input: The user is limited in either the structural variation of his sentences (syntax constrained by an artificial grammar) or in the number of things he can mean (in domains with constrained semantics). Some of these systems are adequate for building English front ends for a variety of data processing tasks and are available commercially. But the fluent use of language typical of hu-mans is still elusive, and understanding natural language (NL) is an active area of research in AI. This article presents a brief sketch of the history of research in Natural Language Processing and an idea of the state of the art in NL. The next article is a historical sketch of research on machine translation from one language to another, which was the subject of the very earliest ideas about processing language with computers. It is followed by sev-eral technical articles on some of the grammars and parsing techniques that AI researchers have used in their programs. Then, after an article on text generation, that is, the creation of sentences by a program to 226 Understanding Natural Language IV express what it wants to say, there are several articles describing the NL programs themselves: the early systems of the 1960s and the major research projects of the last decade, including Wilks's machine translation system, Winograd's SHRDLU, Woods's LUNAR, Schank's MARGIE, SAM, and PAM, and Hendrix's LIFER. Two other chapters of the Handbook are especially relevant to NL research.
  • Book cover image for: The Structure of the Lexicon
    eBook - PDF

    The Structure of the Lexicon

    Human versus Machine

    1. Natural Language Processing Speaking and understanding the speech of others are things we do every day. Under normal circumstances we do these things effortlessly and, it seems, almost instantaneously. It takes almost no effort and very little, if any, conscious thought to turn our thoughts into words and sentences in order to communicate them to others; and, likewise, we ordinarily have no trouble in getting at the thoughts that others express in their words and sentences. (Matthei/Roeper 1983: 13) The use of natural language is one of the most complicated human cognitive activities. 1 Despite its great complexity humans use language like breathing or walking with great ease, applying little or no conscious effort. In contrast to other mental tasks like calculating or playing games, where we are aware of sequentially going through a number of thought processes, the stages of Natural Language Processing are much more difficult to inspect. Various disciplines combine their efforts to investigate natural language and the cognitive abilities underlying language processing. This enormously difficult enterprise can be approached in three different ways. The linguistic approach tries to investigate the rules and principles ac-cording to which natural language operates. The study of natural language has a long tradition. Modern linguistics emerged from comparative philo-logical studies in the 19th century, which have their roots more than two thousand years ago. Even though the philological approach was by and large responsible for the birth of modern linguistics, it was only in the second half of the 20th century that a move away from philologically-oriented language studies to a more structural approach could be observed. With the advent of Noam Chomsky's generative model of a theory of grammar in the 1960s, a new direction in linguistics was manifested.
  • Book cover image for: Cognitive Computing and Big Data Analytics
    • Judith S. Hurwitz, Marcia Kaufman, Adrian Bowles(Authors)
    • 2015(Publication Date)
    • Wiley
      (Publisher)
    The desire to achieve techniques for transforming language has been around for decades. In fact, some historians believe that the first attempt to automate the translation from one language to the next occurred as early as the 17th century. From the 1940s to the late 1960s, much of the work in NLP was targeted to machine translation—translating between human languages. However, these efforts discovered a number of complexities that couldn’t yet be addressed, including syntactic and semantic processing. The primary technique for translating in those years came through using dictionaries to look up words that would then be translated to another language—a slow and tedious process. This problem led computer scientists to devise new tools and techniques focused on developing grammars and parsers with a focus on syntax. The 1980s saw the evolution of more practical tools such as parsers and grammars to allow systems to better understand not just words but the context and meaning of those words. Some of the most important topics that were developed during the 1980s were the notions of word sense disambiguation, probabilistic networks, and the use of statistical algorithms. In essence, this period saw the beginning of moving from a mechanical approach to natural language into a computational and semantic approach to the topic. The trends in NLP in the past two decades have been in language engineering. This movement has coincided with the growth of the web and the expansion of the amount of automation in text as well as spoken language tools.

    Understanding Linguistics

    NLP is an interdisciplinary field that applies statistical and rules-based modeling of natural languages to automate the capability to interpret the meaning of language. Therefore, the focus is on determining the underlying grammatical and semantic patterns that occur within a language or a sublanguage (related to a specific field or market). For instance, different expert domains such as medicine or laws use common words in specialized ways. Therefore, the context of a word is determined by knowing not just its meaning within a sentence, but sometimes by understanding whether it is being used within a particular domain. For example, in the travel industry the word “fall” refers to a season of the year. In a medical context it refers to a patient falling. NLP looks not just at the domain, but also at the levels of meaning that each of the following areas provide to our understanding.

    Language Identification and Tokenization

    In any analysis of incoming text, the first process is to identify which language the text is written in and then to separate the string of characters into words (tokenization
  • Book cover image for: Artificial Intelligence
    • Margaret A. Boden(Author)
    • 1996(Publication Date)
    • Academic Press
      (Publisher)
    C H A P T E R 8 Natural Language Processing Mark Steedman I. INTRODUCTION A. Scope of the Study The subject of Natural Language Processing can be considered in both broad and narrow senses. In the broad sense, it covers processing issues at all levels of natural language understanding, including speech recognition, syntactic and semantic analysis of sentences, reference to the discourse context (in-cluding anaphora, inference of referents, and more extended relations of discourse coherence and narrative structure), conversational inference and implicature, and discourse planning and generation. In the narrower sense, it covers the syntactic and semantic processing of sentences to deliver seman-tic objects suitable for referring, inferring, and the like. Of course, the results of inference and reference may under some circumstances play a part in processing in the narrow sense. But the processes that are characteristic of these other modules are not the primary concern. This chapter is confined mainly to the narrower interpretation of the topic, although it will become apparent that it is impossible to entirely separate it from the broader context. The reader interested in the more global problem is directed to the readings mentioned in the section on Further Reading. Artificial Intelligence Copyright © 1996 by Academic Press, Inc. All rights of reproduction in any form reserved. 229 230 Mark Steedman B. The Anatomy of a Processor All language processors can be viewed as comprising three elements. The first is a grammar, which defines the legal ways in which constituents may combine, both syntactically and semantically, to yield other constituents. The syntactic class to which the grammar belongs also determines a charac-teristic automaton, the minimal abstract computer capable of discriminating the sentences of the language in question from random strings of words, and assigning structural descriptions to sentences appropriate to their se-mantics.
  • Book cover image for: Natural Language Processing for the Semantic Web
    • Diana Maynard, Kalina Bontcheva, Isabelle Augenstein(Authors)
    • 2022(Publication Date)
    • Springer
      (Publisher)
    9 C H A P T E R 2 Linguistic Processing 2.1 INTRODUCTION ere are a number of low-level linguistic tasks which form the basis of more complex language processing algorithms. In this chapter, we first explain the main approaches used for NLP tasks, and the concept of an NLP processing pipeline, giving examples of some of the major open source toolkits. We then describe in more detail the various linguistic processing components that are typically used in such a pipeline, and explain the role and significance of this pre-processing for Semantic Web applications. For each component in the pipeline, we describe its function and show how it connects with and builds on the previous components. At each stage, we provide examples of tools and describe typical performance of them, along with some of the challenges and pitfalls associated with each component. Specific adaptations to these tools for non-standard text such as social media, and in particular Twitter, will be discussed in Chapter 8. 2.2 APPROACHES TO LINGUISTIC PROCESSING ere are two main kinds of approach to linguistic processing tasks: a knowledge-based approach and a learning approach, though the two may also be combined. ere are advantages and disad- vantages to each approach, summarized in Table 2.1. Knowledge-based or rule-based approaches are largely the more traditional methods, and in many cases have been superseded by machine learning approaches now that processing vast quantities of data quickly and efficiently is less of a problem than in the past. Knowledge-based ap- proaches are based on hand-written rules typically written by NLP specialists, and require knowl- edge of the grammar of the language and linguistic skills, as well as some human intuition. ese approaches are most useful when the task can easily be defined by rules (for example: “a proper noun always starts with a capital letter”). Typically, exceptions to such rules can be easily encoded too.
  • Book cover image for: Who Climbs the Grammar-Tree
    eBook - PDF

    Who Climbs the Grammar-Tree

    [leaves for David Reibel]

    • Rosemarie Tracy(Author)
    • 2011(Publication Date)
    • De Gruyter
      (Publisher)
    Both interpretations are, of course, related to each other in very direct ways. But as we shall see, it is the computability aspects that make the discipline interesting and it is the question of what exactly we want to compute in linguistics that this note deals with. In an extremely interesting paper, the late David Marr (1977) gave the following characterization of scientific progress in artificial intelligence: I thank David Reibel for the many discussions we had when our offices were door to door on the 4th floor of the Neuphilologische Fakultät in Tübingen. This note is a summary of a talk given in the spring of 1988 at the Universität München. Bob Kowalski's book Logic for Problem Solving was an important source of inspiration for my views on the topics discussed here. 'Natural Language Processing', for example, is a typical alternative in English speaking communities. As the original version of this paper was written in German, part of its aim was to explore what possible interpretation of various German denominations (e.g. what the 'Computer' in 'Computerlinguistik') might mean. 508 Strictly speaking then, a result in Artificial Intelligence consists of the isolation of a particular information processing problem, the formulation of a computational theory for it, and a practical demonstration that the algorithm is successful. This way of looking at how work in AI should be evaluated has always impressed me and I think that it applies equally well to work in computational linguistics. I will thus discuss - in a relatively abstract manner - how the goals of research in computational linguistics can be characterized along the lines mentioned in Marr's quote, what some of these goals are and what progress has been made to date in reaching them. From the beginning, modern linguistics has emphasized the role of 'computability 1 .
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.