Languages & Linguistics
Parsing
Parsing refers to the process of analyzing a string of symbols according to the rules of a formal grammar. In linguistics, parsing involves breaking down a sentence into its constituent parts to understand its structure and meaning. This process is essential for natural language processing, syntax analysis, and understanding the grammatical structure of languages.
Written by Perlego with AI-assistance
Related key terms
1 of 5
8 Key excerpts on "Parsing"
- eBook - PDF
Algorithms and Theory of Computation Handbook, Volume 2
Special Topics and Techniques
- Mikhail J. Atallah, Marina Blanton, Mikhail J. Atallah, Marina Blanton(Authors)
- 2009(Publication Date)
- Chapman and Hall/CRC(Publisher)
Furthermore, the design of many programming languages is such that the single parse can be found deterministically, which means that every Parsing step contributes a fragment of the resulting parse. As parses have a size linear in the length of the input, this explains why Parsing is possible in linear time. Subsequent processing of the parse, for example in order to compile to machine code, is also commonly possible in close to linear time. Natural languages are quite different in this respect. Like programs in a programming language, sentences in a natural language can be assigned parses, but often the sentences are ambiguous and allow more than one parse. Even for a single parse, there may be ambiguity in the meanings of words or expressions. The existence of ambiguity in natural language is witnessed by frequent misunderstandings in daily life, but it is also an essential feature of poetry and puns. The field of natural language processing (NLP) studies algorithms, tools, and techniques for the automatic processing of natural languages. A related if not synonymous term is computational linguistics, which stresses that the field can be seen as a subfield of linguistics , which is the study of (natural) language. 23 -1 23 -2 Special Topics and Techniques Language can be investigated from different perspectives. At the lowest level, phonetics and phonology study the sounds that spoken language consists of and the rules that govern them. Morphology is the study of the internal structure of words. Syntax is the study of how words are combined to form sentences. The meaning and the use of language are studied by semantics and pragmatics, and the structure of human communication is studied by discourse. The problem of Parsing, with which we started our exposition, concerns the syntax of language. - eBook - PDF
- Carlos Gomez-rodriguez(Author)
- 2010(Publication Date)
- ICP(Publisher)
The purpose and contribution of this monograph is to solve these prob-lems, by defining a set of extensions and tools to make Parsing schemata more useful in practice. In addition to this, research results obtained by using these extensions and tools are also presented. 1.2 Background In this section, the contributions of this book are put into context by briefly introducing the problem of Parsing natural language sentences and the for-malism of Parsing schemata. 1.2.1 Parsing natural language In the context of computational linguistics, the process of Parsing , or syn-tactic analysis, consists of finding the grammatical structure of natural language sentences. Given a sentence represented as a sequence of sym-bols, each corresponding to a word, a Parsing algorithm (or simply parser ) will try to find and output a representation of the syntactic structure of the sentence. The nature of this representation depends on the linguistic theory that the parser uses to describe syntax. In phrase structure parsers , or con-stituency parsers , sentences are analysed by dividing them into meaningful segments called constituents , which are, in turn, broken up into smaller constituents. The result of a constituency analysis can be represented with a phrase structure tree (or constituency tree ), as can be seen in Figure 1.1a. On the other hand, in dependency-based parsers , the structure of a sentence is represented by a set of directed links ( dependencies ) between their words, which form a graph as can be seen in Figure 1.1b. Parsing is a fundamental process in any natural language processing pipeline, since obtaining the syntactic structure of sentences provides us with information that can be used to extract meaning from them: con-stituents correspond to units of meaning, and dependency relations describe the ways in which they interact, such as who performed the action described in a sentence or which object is receiving the action. Thus, we can find - eBook - PDF
Directions in Corpus Linguistics
Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991
- Jan Svartvik(Author)
- 2011(Publication Date)
- De Gruyter Mouton(Publisher)
Probabilistic Parsing Geoffrey Sampson We want to give computers the ability to process human languages. But computers use systems of their own which are also called languages, and which share at least some features with human languages; and we know how computers succeed in processing computer languages, since it is humans who have arranged for them to do so. Inevitably there is a temptation to see the automatic processing of computer languages as a precedent or model for the automatic processing (and perhaps even for the human processing) of human languages. In some cases the precedent may be useful, but clearly we can-not just assume that human languages are similar to computer languages in all relevant ways. In the area of grammatical Parsing of human languages, which seems to be acknowledged by common consent as the central problem of natural language processing - NLP - at the present time, I believe the computer-language precedent may have misled us. One of the ideas underly-ing my work is that human languages, as grammatical systems, may be too different from computer languages for it to be appropriate to use the same approaches to automatic Parsing. Although the average computer scientist would probably think of natural-language Parsing as a somewhat esoteric task, automatic Parsing of computer programming languages such as C or Pop-11 is one of the most fundamental computing operations; before a program written in a user-oriented program-ming language such as these can be run it must be compiled into machine code - that is, automatically translated into a very different, low level programming language - and compilation depends on extracting the gram-matical structure by virtue of which the C or Pop-11 program is well-formed. To construct a compiler capable of doing this, one begins from a production system (i.e. a set of rules) which defines the class of well-formed programs in the relevant user-oriented language. - eBook - PDF
- Ralph W. Fasold, Jeff Connor-Linton(Authors)
- 2014(Publication Date)
- Cambridge University Press(Publisher)
Describing the process takes a long time, but computers can run through it in a fraction of a second! In contrast to working top-down, a parser could proceed bottom-up, somewhat as structures were built in Chapter 3 . In the fi rst steps, each input word is associated with a lexical category. For example, I would be associated with a Pronoun node. Then, the lexical category nodes are connected to their parent nodes using the rules of the grammar, so that, for example, Pronoun will be a child of NP. Bottom-up parsers may waste time building structures that don ’ t lead to an S, while top-down parsers may, as we have seen, waste time 485 Computational linguistics expanding structures that don ’ t lead to a match with the input. More advanced Parsing strategies, like chart Parsing, combine top-down and bottom-up approaches. Quick comprehension check This is a good time to pause and con fi rm that you have understood how a parser, given a grammar, can syntactically analyze a sentence in a language. In fact, you should be able to mimic the steps the computer takes in carrying out a top-down parse of a sentence. Exercises 14.2 and 14.3 will further consolidate your understanding. Part-of-speech tagging As we have seen, a word (like can and fi sh ) can have different possible parts of speech. Instead of having the parser consider all parts of speech of an ambiguous word, it is possible to reduce the ambiguity prior to Parsing, by running a program called a part-of-speech tagger before Parsing. To each word that has more than one part of speech, the tagger assigns the most likely part of speech. This is done based on context-based rules derived by human intuition (for example, after the , fi sh is likely to be a noun), as well as rules derived by machines that learn from a collection of example sentences that have been tagged already with parts of speech. - eBook - PDF
Computers and Languages
Theory and Practice
- A. Nijholt(Author)
- 2014(Publication Date)
- North Holland(Publisher)
This following of trends of fashion is not necessarily beneficial for the long-term development of computer science. The Development of Parsing Theory In the development of Parsing theory as a topic in (Theoretical) Computer Sci-ence we distinguish the following four periods. 1950-1960 Some theory and ideas are available from logic and (mathematical) linguistics. Methods for checking well-formedness of Boolean expressions had already been devised in the late forties and early fifties. Compiler builders start (almost) from scratch and develop parsers'' as parts of compilers for specific languages and computers. The first compilers were keyword-oriented rather than syntax-oriented. Ideas which were used were not always written down, partly because the parser was interwoven with the other components of the compiler. When Parsing was studied separately, the interest focussed on pars-ing and compilation of arithmetical expressions (formulae). Later it was recognized that this was only part of the more general problem of compiling programming languages. At the end of this period we see the presentation of ideas on Parsing independent of a particular programming language. During these years in machine translation research a move was made from word-for-word translation to sentence-for-sentence and syntax-oriented translation. 1960-1970 Formal language theory emerges from linguistics and the design and use of programming languages. Models of compilers and theories concerning pars-ing are developed. Ideas which have been used earlier by compiler builders are formalized into Parsing methods. This development benefits from a change invoked by the recognition that more fundamental research in natural language analysis is necessary before the machine translation problem can be successfully attacked. In this period Parsing theory starts to raise its own problems. Some of these problems can be solved by adapting results obtained in logic. - eBook - ePub
Psychology of Language (PLE: Psycholinguistics)
An Introduction to Sentence and Discourse Processes
- Murray Singer(Author)
- 2013(Publication Date)
- Taylor & Francis(Publisher)
3 Syntax and Parsing ProcessesIn order to understand a sentence, it is necessary to analyze it into its grammatical elements, called constituents. This analysis is known as Parsing. Accomplishing this task requires knowledge of the grammar of one’s native language. The fact that people are routinely in possession of this knowledge is reflected by their ability to discriminate between grammatical and ungrammatical sentences. In this regard, there is no doubt that (1a) is an acceptable English sentence, but (1b) is not.Likewise, (2a) and (2b) are acceptable ways of expressing Jane’s thoughts about the crop, but (2c) is not.(1) a. Wild beasts frighten little children. b. *Beasts children frighten wild little. (2) a. Jane thought the crop was not healthy. b. Jane did not think the crop was healthy. c. *Jane thought not the crop was healthy. People’s grammatical skills also permit them to recognize that the sentence, Colorless green ideas sleep furiously, complies with the rules of English, even though it is anomalous (Chomsky, 1957).The ability to judge the grammaticality of a sentence reflects people’s knowledge of the ideal form of their language, knowledge that is referred to as linguistic competence (Chomsky, 1957). Linguistic competence does not enable people to articulate the grammatical rules of their native tongue. Only instruction in linguistics provides this capability. Therefore, one’s intuitive linguistic knowledge is procedural in nature (see chapter 1 ).People’s utterances, their language performance (Chomsky, 1957), frequently deviate from the ideal grammatical form. Consider the following fragment from the Watergate transcripts:President Nixon: Let me say with regard to Colson—and you can say that I’m way ahead of them on that—I’ve got the message on that and that he feels that Dean—but believe me I’ve been thinking about that all day yesterday—whether Dean should be given immunity. (Watergate Transcripts - eBook - PDF
- Nitin Indurkhya, Fred J. Damerau, Nitin Indurkhya, Fred J. Damerau(Authors)
- 2010(Publication Date)
- Chapman and Hall/CRC(Publisher)
Higher-level analysis is then left for processing by other means. One of the earliest approaches to partial Parsing was Fidditch (Hindle 1989, 1994). A key idea of this approach is to leave constituents whose roles cannot be determined unattached, thereby always providing exactly one analysis for any given sentence. Another approach is supertagging , introduced by Bangalore and Joshi (1999) for the LTAG formalism as a means to reduce ambiguity by associating lexical items with rich descriptions (supertags) that impose complex constraints in a local context, but again without itself deriving a syntactic analysis. Supertagging has also been successfully applied within the CCG formalism (Clark and Curran 2004). A second option is to sacrifice completeness with respect to covering the entire input , by Parsing only fragments that are well-formed according to the grammar. This is sometimes referred to as skip Parsing . Partial Parsing is a means to achieve this, since leaving a fragment unattached may just as well be seen as a way of skipping that fragment. A particularly important case for skip Parsing is noisy input, such as written text containing errors or output from a speech recognizer. (A word error rate around 20%–40% is by no means unusual in recognition of spontaneous speech; see Chapter 15.) For the Parsing of spoken language in conversational systems, it has long been commonplace to use pattern-matching rules that Syntactic Parsing 81 trigger on domain-dependent subsets of the input (Ward 1989, Jackson et al. 1991, Boye and Wirén 2008). Other approaches have attempted to render deep Parsing methods robust, usually by trying to connect the maximal subset of the original input that is covered by the grammar. For example, GLR ∗ (Lavie 1996, Lavie and Tomita 1996), an extension of GLR (Section 4.6.3), can parse all subsets of the input that are licensed by the grammar by being able to skip over any words. - eBook - PDF
- Michael Arbib(Author)
- 2012(Publication Date)
- Academic Press(Publisher)
THE PARSER We now turn to the Parsing model itself. The reader should note that the sketch presented here contains only enough detail to enable the discussion of those aspects of the Parsing model relevant to this chapter; for a full discussion of the Parsing model see Marcus (1980). The Parsing model to be discussed here has been embodied in a working computer program called Parsifal, and for ease of reference, I will often refer to this specific Parsing model as Parsifal in the following sections. Parsifal can be viewed as being composed of a grammar of Parsing rules that 6. Functional Deficits in a Parsing Model 119 act upon the incoming word string to build up syntactic structures contained in internal data structures. More specifically, the grammar rules first examine the contents of Parsifal's internal data structures and the incoming word string, and then, contingent upon what structures already exist, add to the contents of those data structures. All structures built up by Parsifal are contained in two data structures in particular, the push down stack and the buffer, as shown in Figure 6.2. The push down stack contains constituents whose internal structure can not yet be deter-mined to be complete, whereas the buffer contains constituents that are complete (to a first approximation), but that have not yet been assigned a syntactic role in some larger structure. Thus, for example, in Figure 6.2, the stack contains two nodes, a sentence (S) node and a verb phrase (VP) node, with the VP a daughter of the S. (Note that the stack grows downward so that the root of the emerging parse tree will always be at the top of the stack.) The noun phrase (NP) you is also a daughter of the S node; however, it is neither in the stack nor the buffer because both its internal structure and its role in a larger structure have already been determined by the parser.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.







