Connectionist Approaches to Natural Language Processing
eBook - ePub

Connectionist Approaches to Natural Language Processing

  1. 472 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Connectionist Approaches to Natural Language Processing

About this book

Originally published in 1992, when connectionist natural language processing (CNLP) was a new and burgeoning research area, this book represented a timely assessment of the state of the art in the field. It includes contributions from some of the best known researchers in CNLP and covers a wide range of topics.

The book comprises four main sections dealing with connectionist approaches to semantics, syntax, the debate on representational adequacy, and connectionist models of psycholinguistic processes. The semantics and syntax sections deal with a variety of approaches to issues in these traditional linguistic domains, covering the spectrum from pure connectionist approaches to hybrid models employing a mixture of connectionist and classical AI techniques.

The debate on the fundamental suitability of connectionist architectures for dealing with natural language processing is the focus of the section on representational adequacy. The chapters in this section represent a range of positions on the issue, from the view that connectionist models are intrinsically unsuitable for all but the associationistic aspects of natural language, to the other extreme which holds that the classical conception of representation can be dispensed with altogether.

The final section of the book focuses on the application of connectionist models to the study of psycholinguistic processes. This section is perhaps the most varied, covering topics from speech perception and speech production, to attentional deficits in reading. An introduction is provided at the beginning of each section which highlights the main issues relating to the section topic and puts the constituent chapters into a wider context.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Connectionist Approaches to Natural Language Processing by R G Reilly,Noel Sharkey in PDF and/or ePUB format, as well as other popular books in Psychologie & Psychologie cognitive et cognition. We have over one million books available in our catalogue for you to explore.

Information

1
Connectionist Natural Language Processing
Noel E. Sharkey
Department of Computer Science, University of Exeter, Exeter, U.K.
Ronan G. Reilly
Department of Computer Science, University College Dublin, Dublin, Ireland
INTRODUCTION
Computational research on natural language has been going on for decades in artificial intelligence and computational linguistics. These disciplines generated enormous excitement in the ’60s and ’70s, but they have not entirely realised their promise and have now reached what seems to be a plateau. Why should connectionist natural language processing (CNLP) be any different? There are a number of reasons. For many, the connectionist approach provides a new way of looking at old issues. For these researchers, connectionism provides an expanded toolkit with which to invigorate old research projects with new ideas. For instance, connectionist systems can learn from examples so that, in the context of a rule-based system, all of the rules need not be specified a priori. Connectionist systems have very powerful generalisation capabilities. Content addressable memory or pattern completion falls naturally out of distributed connectionist systems, making them ideal for filling in missing information.
However, for many new researchers, connectionism is a whole new way of looking at language. The big promise is that the integration of learning and representation (e.g. Hanson & Burr, 1990) will be a source of new theoretical ideas. Connectionist devices are very good at constructing representations from the statistical regularities of a domain. They do so in a form that is not directly interpretable. Nevertheless, it is the nature of these representations, uninfluenced by a priori theoretical considerations, that hold the most promise for the discipline. Currently, connectionists are seeking ways of analysing such representations as a means of developing a new understanding of the problems facing automated language processing.
A Brief History
As far as we know, the first paper that discussed language in terms of parallel distributed processing was by Hinton (1981)1. Although that paper was really about implementing semantic nets in parallel hardware, many of the problem areas described by Hinton have been explored further in the natural language papers of the 1980s. The Hinton system took as input a distributed representation of word triples consisting of ROLE1 RELATION ROLE2. In other words, simple propositions such as ELEPHANT COLOUR GREY. When the system had finished learning the propositions, its task was to complete the third term of an input triple given only two of the terms. For example, given the terms ELEPHANT and COLOUR the system filled in the missing term GREY. This was very similar to the notion of default reasoning in AI. But Hinton went further, to discuss how his system could generalise its experience to novel examples. If the system knew that CLYDE was an elephant (i.e. the token CLYDE contained the type ELEPHANT microfeatures), then, given the two terms CLYDE and COLOUR, the third term GREY would be filled in.
What was interesting about Hinton’s work was that he described two types of representation that have become commonplace in CNLP. The first concerns the input to a language system. In any sort of natural language system it is important to preserve the ordering of the input elements. Hinton did this by partitioning the input vector so that the first n bits represented the ROLE1 words, the second n bits represented the RELATION words, and the final n bits represented the ROLE2 words. There are a number of problems with this representational approach, such as redundancy, fixed length, and absence of semantic similarity among identical elements in different roles. Nonetheless, it has been widely used in the literature, both for input and output, and has only been superseded in the last two years, as we shall see.
The second type of representation used by Hinton was a distributed coarse-coded or compact representation. That is, the vector of input activations was recoded into a compact representation by random weights connected to a second layer of units. The states of this second layer of units were then fed back to the input layer and the weights were adjusted until the states from the second layer accurately reproduced the input. This is how the system filled in the missing term. It was also from the distributed representation that this system gained its generalisation abilities. Although such content-addressable memory systems were already well known, no one had used them in a language-related problem before.
The next four years from 1981 onwards saw only a few published papers, and most of these did not employ distributed representations. Distributed representations have a number of advantages over nondistributed or ā€œlocalistā€ representations. For example, they have a greater psychological plausibility, they are more economical in the use of memory resources, and they are more resistant to disruption. However, prior to the development of sufficiently powerful learning algorithms, researchers found localist representations to be easier to work with, since they could readily be hand-coded. Small, Cottrell, and Shastri (1982) made a first brave stab at connectionist parsing. Though not greatly successful, this localist work opened the way for other linguistic-style work and provided a basis for Cottrell’s (1985) thesis research, at Rochester, on word sense disambiguation. That year also saw a Technical Report from another Rochester student, Fanty (1985), that attempted to employ localist techniques to do context-free parsing. The same year, Selman (1985) presented a master’s thesis that utilised the Boltzmann learning algorithm (Hinton, Sejnowski & Ackley, 1984) for syntactic parsing. There were many interesting ideas in Selman’s thesis, but the use of simulated annealing proved to be too cumbersome for language (but see Sampson, 1989). Also in that year, a special issue of the Cognitive Science journal featured a language article by Waltz and Pollack (1985) who were not only concerned with parsing but also with contextual semantics. Prior to this paper, only Reilly (1984) had attempted a connectionist approach to the higher-level phenomena in his paper on anaphoric resolution.
Then, in 1986, there was a relative explosion of language-related papers. First, there were papers on the use of connectionist techniques for language work using AI style theory (e.g. Golden, 1986; Lehnert, 1986; Sharkey, Sutcliffe, & Wobcke, 1986). These papers were followed closely by the publication of the two volumes on parallel distributed processing (PDP) edited by David Rumelhart and Jay McClelland (Rumelhart & McClelland, 1986b; McClelland & Rumelhart, 1986). The two volumes contained a number of papers relating to aspects of natural language processing such as case-role assignment (McClelland & Kawamoto, 1986); learning the past tense of verbs (Rumelhart & McClelland, 1986a); and word recognition in reading (McClelland, 1986). Furthermore, the two volumes opened up the issue of representation in natural language which had started with Hinton (1981).
However, one paper (Rumelhart, Hinton, & Williams, 1986) in the PDP volumes significantly changed the style of much of connectionist research. This paper described a new learning algorithm employing a generalisation of a learning rule first proposed by Widrow and Hoff (1960). The new algorithm, usually referred to as the backpropagation algorithm, opened up the field of connectionist research, because now we could process input patterns that were not restricted by the constraint that they be in linearly separable classes (c.f. Allen, 1987 for a number of language studies employing the new algorithm). In the same year, Sejnowski and Rosenberg (1986) successfully applied the backpropagation algorithm to the problem of text-to-speech translation. And Hinton (1986) applied it to the learning of family trees (inheritance relations). These papers began a line of research devoted to examining the type of internal representation learned by connectionist networks in order to compute the required input–output mapping (c.f. Hanson & Burr, 1990).
A significant extension to the representational capacity of connectionist networks was made by Jordan (1986). He proposed an architectural variant of the standard feed-forward backpropagation network. This variant involved feedback from the output layer to the input layer (thus, forming a recurrent network) which enabled the construction of powerful sequencing systems. By using the recurrent links to store a contextual history of any particular sequence, they overcame many of the difficulties that connectionist systems had in dealing with problems having a temporal structure. Later work by Elman (1988; 1989) utilised a similar architecture, but ran the recurrent links from the hidden units rather than from the output units. This variant enabled Elman to develop a CNLP model that appeared to have many of the properties of conventional symbol-processing models, such as sensitivity to compositional structure. This latter property had earlier been pinpointed by Fodor and Pylyshyn (1988) in their critique of connectionism as a significant and irredeemable deficit in CNLP systems. Another important advantage of Elman’s approach was that words (from a sentence) could be presented to the system in sequence. This departure from Hinton’s (1981) vector partitioning approach overcame problems of redundancy, lack of semantic similarity between identical items, and fixed input length.
Since 1986, many more papers on language issues have begun to appear which are too numerous to mention here. Among these was further work on the application of world knowledge to language understanding (e.g. Chun & Mimo, 1987; Dolan & Dyer, 1987; Miikkulainen, 1990; Sharkey, 1989a). Research on various aspects of syntax and parsing has increased sharply (e.g. Benello, Makie, & Anderson, 1989; Hanson & Kegl, 1987; Howells, 1988; Kwasny & Faisal, 1990; Rager & Berg, 1990). Moreover, there has been an increase in research on other aspects of natural language such as speech production (Dell, 1986; Seidenberg & McClelland, 1989), sentence and phrase generation (e.g. Gasser, 1988; Kukich, 1987), question answering (Allen, 1988), prepositional attachment (e.g. Cosic & Munro, 1988), anaphora (Allen & Riecken, 1988), cognitive linguistics (Harris, 1990), discourse topic (Karen, 1990), lexical processing (Sharkey, 1989b; Sharkey & Sharkey, 1989; Kawamoto, 1989), variable binding (Smolensky, 1987), and speech processing (e.g., Kohonen, 1989; Hare, 1990; Port, 1990).
OVERVIEW OF CHAPTERS
The book is divided into four sections. The first section (Semantics) contains four chapters that deal with connectionist issues in both lexical and structural semantics. The second section (Syntax) contains two chapters dealing with connectionist parsing. The third section (Representational Adequacy) contains three chapters dealing with the controversial issue of the representational adequacy of connectionist representations. The fourth and final section (Computational Psycholinguistics) contains four chapters which focus on the cognitive modelling role of connectionism and which address a variety of topics in the area of computational psycholinguistics.
In what follows we will give a brief introduction to each of the chapters. For a more detailed discussion of some of the relevant issues, we provide an introduction at the beginning of each section.
Semantics
The four chapters in this section can be divided into two. The first pair of chapters deal with what can be best characterised as lexical semantics and the second pair with sentential or structural semantics.
In the first chapter of this section, Dyer et al. discuss a method for modifying distributed representations dynamically, by maintaining a separate, distributed connectionist network as a symbol memory, where each symbol is composed of a pattern of activation. Symbol representations start out as random patterns of activation. Over time they are ā€œrecirculatedā€ through the symbolic tasks being demanded of them, and as a result, gradually form distributed representations that aid in the performance of these tasks. These distributed symbols enter into structured relations with other symbols, while exhibiting features of distributed representations, e.g. tolerance to noise and similarity-based generalisation to novel cases. Dyer et al. discuss in detail a method of symbol recirculation based on using entire weight matrices, formed in one network, as patterns of activation in a larger network. In the case of natural language processing, the resulting symbol memory can serve as a store for lexical entries, symbols, and relations among symbols, and thus represent semantic information.
In his chapter, Sutcliffe focuses on how the meaning of concepts is represented using microfeatures. He shows how microfeatural representations can be constructed, how they can be compared using the dot product, and why normalisation of microfeature vectors is required. He then goes on to describe the use of such representations in the construction of a lexicon for a story-paraphrasing system. Finally, he discusses the properties of the chosen representation and describes possible further developments of the work.
Wermter and Lehnert describe an approach combining natural language processing and connectionist learning. Concentrating on the domain of scientific language and the task of structural noun phrase disambiguation they present NOCON, a system which shows how learning can supply a memory model as the basis for understanding noun phrases. NOCON consists of two levels: a learning level at the bottom for learning semantic relationships between nouns and an integration level at the top for integrating semantic and syntactic constraints needed for structural noun phrase disambiguation. Wermter and Lehnert argue that this architecture is potentially strong enough to provide a learning and integrating memory model for natural language systems.
In the final chapter of this section St. John and McClelland argue that the parallel constraint satisfaction mechanism of connectionist models is a useful language comprehension algorithm; it allows syntactic and semantic constraints to be combined easily so that an interpretation which satisfies the most constraints can be found. It also allows interpretations to be revised easily, knowledge from different contexts to be shared, and it makes inferences an inherent part of comprehension. They present a model of sentence comprehension that addresses a range of important language phenomena. They show that the model can be extended to story comprehension. Both the sentence and story models view their input as evidence that constrains a complete interpretation. This view facilitates difficult aspects of sentence comprehension such as assigning thematic roles. It also facilitates difficult aspects of story comprehension such as inferring missing propositions, resolving pronouns, and sharing knowledge between contexts.
Syntax
The section on syntax provides a number of differing perspectives on how to deal with syntax in a connectionist network. The chapters are similar in that the models described are predominantly localist in nature. Rager focuses on robustness in parsing and Schnelle...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Original Copyright Page
  6. Dedication
  7. Table of Contents
  8. List of Contributors
  9. Preface
  10. 1 Connectionist Natural Language Processing
  11. PART I SEMANTICS
  12. PART II SYNTAX
  13. PART III REPRESENTATIONAL ADEQUACY
  14. PART IV COMPUTATIONAL PSYCHOLINGUISTICS
  15. Author Index
  16. Subject Index