Handbook of Latent Semantic Analysis
eBook - ePub

Handbook of Latent Semantic Analysis

Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch, Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch

Share book
  1. 544 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Handbook of Latent Semantic Analysis

Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch, Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch

Book details
Book preview
Table of contents

About This Book

The Handbook of Latent Semantic Analysis is the authoritative reference for the theory behind Latent Semantic Analysis (LSA), a burgeoning mathematical method used to analyze how words make meaning, with the desired outcome to program machines to understand human commands via natural language rather than strict programming protocols. The first book

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Handbook of Latent Semantic Analysis an online PDF/ePUB?
Yes, you can access Handbook of Latent Semantic Analysis by Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch, Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, Walter Kintsch in PDF and/or ePUB format, as well as other popular books in Psychologie & Histoire et théorie en psychologie. We have over one million books available in our catalogue for you to explore.




Introduction to LSA

Theory and Methods


LSA as a Theory of Meaning

Thomas K Landauer
Pearson Knowledge Technologies and University of Colorado
The fundamental scientific puzzle addressed by the latent semantic analysis (LSA) theory is that there are hundreds of distinctly different human languages, every one with tens of thousands of words. The ability to understand the meanings of utterances composed of these words must be acquired by virtually every human who grows up surrounded by language. There must, therefore, be some humanly shared method—some computational system—by which any human mind can learn to do this for any language by extensive immersion, and without being explicitly taught definitions or rules for any significant number of words.
Most past and still popular discussions of the problem focus on debates concerning how much of this capability is innate and how much learned (Chomsky, 1991b) or what abstract architectures of cognition might support it—such as whether it rests on association (Skinner, 1957) or requires a theory of mind (Bloom, 2000).
The issue with which LSA is concerned is different. LSA theory addresses the problem of exactly how word and passage meaning can be constructed from experience with language, that is, by what mechanisms—instinctive, learned, or both—this can be accomplished.
Carefully describing and analyzing the phenomenon has been the center of attention for experimental psychology, linguistics, and philosophy. Other areas of interest include pinpointing what parts of the brain are most heavily involved in which functions and how they interact, or positing functional modules and system models. But, although necessary or useful, these approaches do not solve the problem of how it is possible to make the brain, or any other system, acquire the needed abilities at their natural scale and rate.
This leads us to ask the question: Suppose we have available a corpus of data approximating the mass of intrinsic and extrinsic language-relevant experience that a human encounters, a computer with power that could match that of the human brain, and a sufficiently clever learning algorithm and data storage method. Could it learn the meanings of all the words in any language it was given?
The keystone discovery for LSA was that using just a single simple constraint on the structure of verbal meaning, and a rough approximation to the same experience as humans, LSA can perform many meaning-based cognitive tasks as well as humans.
That this provides a proof that LSA creates meaning is a proposition that manifestly requires defense. Therefore, instead of starting with explication of the workings of the model itself, the chapter first presents arguments in favor of that proposition. The arguments rest on descriptions of what LSA achieves and how its main counterarguments can be discounted.

The Traditional Antilearning Argument

Many well-known thinkers—Plato, Bickerton (1995), Chomsky (1991b), Fodor (1987), Gleitman (1990), Gold (1967), Jackendoff (1992), Osherson, Stob, and Weinstein (1984), Pinker (1994), to name a few—have considered this prima facie impossible, usually on the grounds that humans learn language too easily, that they are exposed to too little evidence, correction, or instruction to make all the conceptual distinctions and generalizations that natural languages demand. This argument has been applied mainly to the learning of grammar, but has been asserted with almost equal conviction to apply to the learning of word meanings as well, most famously by Plato, Chomsky, and Pinker. Given this postulate, it follows that the mind (brain, or any equivalent computational system) must be equipped with other sources of conceptual and linguistic knowledge. This is not an entirely unreasonable hypothesis. After all, the vast majority of living things come equipped with or can develop complex and important behavioral capabilities in isolation from other living things. Given this widely accepted assumption, it would obviously be impossible for a computer using input only from a sample of natural language in the form of unmodified text to come even close to doing things with verbal meaning that humans do.

The LSA Breakthrough

It was thus a major surprise to discover that a conceptually simple algorithm applied to bodies of ordinary text could learn to match literate humans on tasks that if done by people would be assumed to imply understanding of the meaning of words and passages. The model that first accomplished this feat was LSA.
LSA is a computational model that does many humanlike things with language. The following are but a few: After autonomous learning from a large body of representative text, it scores well into the high school student range on a standardized multiple-choice vocabulary test; used alone to rate the adequacy of content of expository essays (other variables are added in full-scale grading systems; Landauer, Laham, & Foltz, 2003a, 2003b), estimated in more than one way, it shares 85%–90% as much information with expert human readers as two human readers share with each other (Landauer, 2002a); it has measured the effect on comprehension of paragraph-to-paragraph coherence better than human coding (Foltz, Kintsch, & Landauer, 1998); it has successfully modeled several laboratory findings in cognitive psychology (Howard, Addis, Jing, & Kahana, chap. 7 in this volume; Landauer, 2002a; Landauer & Dumais, 1997; Lund, Burgess, & Atchley, 1995); it detects improvements in student knowledge from before to after reading as well as human judges (Rehder et al., 1998; Wolfe et al., 1998); it can diagnose schizophrenia from what patients say as well as experienced psychiatrists (ElvevĂ„g, Foltz, Weinberger, & Goldberg, 2005); it improves information retrieval by up to 30% by being able to match queries to documents of the same meaning when there are few or no words in common and reject those with many when irrelevant (Dumais, 1991), and can do the same for queries in one language matching documents in another where no words are alike (Dumais, Landauer, & Littman, 1996); it does its basic functions of correctly simulating human judgments of meaning similarity between paragraphs without modification by the same algorithm in every language to which it has been applied, examples of which include Arabic, Hindi, and Chinese in their native orthographic or ideographic form; and when sets of all LSA similarities among words for perceptual entities such as kinds of objects (e.g., flowers, trees, birds, chairs, or colors) are subjected to multidimensional scaling, the resulting structures match those based on human similarity judgments quite well in many cases, moderately well in others (Laham, 1997, 2000), just as we would expect (and later explain) because text lacks eyes, ears, and fingers.
I view these and its several other successful simulations (see Landauer, 2002a; Landauer, Foltz, & Laham, 1998) as evidence that LSA and models like it (Griffiths & Steyvers, 2003; Steyvers & Griffiths, chap. 21 in this volume) are candidate mechanisms to explain much of how verbal meaning might be learned and used by the human mind.

About LSA’s Kind of Theory

LSA offers a very different kind of account of verbal meaning from any that went before, including centuries of theories from philosophy, linguistics, and psychology. Its only real predecessor is an explanation inherent in connectionist models but unrealized yet at scale (O’Reilly & Munakata, 2000). Previous accounts had all been in the form of rules, descriptions, or variables (parts of speech, grammars, etc.) that could only be applied by human intercession, products of the very process that needs explanation. By contrast, at least in programmatic goal, the LSA account demands that the only data allowed the theory and its computational instantiations be those to which natural human language users have access. The theory must operate on the data by means that can be expressed with mathematical rigor, not through the intervention of human judgments. This disallows any linguistic rule or structure unless it can be proved that all human minds do equivalent things without explicit instruction from other speakers, the long unattained goal of the search for a universal grammar. It also rules out as explanations—as contrasted with explorations—computational linguistic systems that are trained on corpora that have been annotated by human speakers in ways that only human speakers can.
This way of explaining language and its meaning is so at odds with most traditional views and speculations that, in Piaget’s terminology, it is hard for many people, both lay and scholar, to accommodate. Thus, before introducing its history and more of its evidence and uses, I want to arm readers with a basic understanding of what LSA is and how it illuminates what verbal meaning might be.

But What is Meaning?

First, however, let us take head-on the question of what it signifies to call something a theory of meaning. For a start, I take it that meaning as carried by words and word strings is what allows modern humans to engage in verbal thought and rich interpersonal communication. But this, of course, still begs the question of what meaning itself is.
Philosophers, linguists, humanists, novelists, poets, and theologians have used the word “meaning” in a plethora of ways, ranging, for example, from the truth of matters to intrinsic properties of objects and happenings in the world, to mental constructions of the outside world, to physically irreducible mystical essences, as in Plato’s ideas, to symbols in an internal communication and reasoning system, to potentially true but too vague notions such as how words are used (Wittgenstein, 1953). Some assert that meanings are abstract concepts or properties of the world that exist prior to and independently of any language-dependent representation. This leads to assertions that by nature or definition computers cannot create meaning from data; meaning must exist first. Therefore, what a computer creates, stores, and uses cannot, ipso facto, be meaning itself.
A sort of corollary of this postulate is that what we commonly think of as the meaning of a word has to be derived from, “grounded in,” already meaningful primitives in perception or action (Barsalou, 1999; Glenberg & Robertson, 2000; Harnad, 1990; Searle, 1982). In our view (“our” meaning proponents of LSA-like theories), however, what goes on in the mind (and, by identity, the brain) in direct visual or auditory, or any other perception, is fu...

Table of contents