eBook - ePub

Semantic Knowledge Representation for Information Retrieval

Name: Semantic Knowledge Representation for Information Retrieval
ISBN: 9783110395976

Winfried Gödert,

Jessica Hubrich,

Matthias Nagelschmidt,

308 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Semantic Knowledge Representation for Information Retrieval

Winfried Gödert,

Jessica Hubrich,

Matthias Nagelschmidt,

About this book

This book covers the basics of semantic web technologies and indexing languages, and describes their contribution to improve methods of formal knowledge representation and reasoning. The methodologies included combine the specifics of indexing languages, Web representation languages and intersystem relations, and explain their contribution to search functionalities in information retrieval scenarios. An example oriented discussion, considering aspects of conceptual and semantic interoperability in processes of subject querying and knowledge exploration is provided. The book is relevant to information scientists, knowledge workers and indexers. It provides a suitable combination of theoretical foundations and practical applications.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Computer Science

Subtopic

Computer Science General

Index

Computer Science

1 Introduction: Envisioning Semantic Information Spaces

Indexing languages, interoperability, information retrieval, semantic technologies – is it really worth examining the particular interaction of these rather differing subjects, as we do in this book? In this preliminary chapter we try to give a first answer why we think it is. Therefore we will pick up the idea of a semantic information space again, which was already mentioned in the preface and make it more concrete by envisioning some examples. We will take a first naive look at search situations and the impact of semantic knowledge representation, yet without considering the conceptual or technical background. Thus in this first look, information retrieval systems, indexing languages and semantic technologies are treated as a black box, which ideally provides a search environment that can be somehow characterized as a semantic information space.

Examples in this book are heterogeneous and (amongst some others) taken from the domains of chemistry, physics and biology, particularly ornithology. Although neither the authors nor the subjects of this book are affiliated to these disciplines, we will nevertheless occasionally revert to them, as they are clearly outside of our own profession and can be seen insofar as a “neutral” domain, which seems to provide a lower risk of misunderstanding than examples from the less accurate fields of humanities or social sciences would probably provide. However, there are of course no special skills in natural sciences needed to read and understand the examples and to follow the argumentation. All examples are trivial enough to be understood even without any substantial chemical, physical or zoological knowledge.

When speaking of an “information space”, one could quite generally think of two extremes: either a collection of information resources that are widely homogenous in form and content and centralized in one storage or a heterogeneous collection, distributed over several repositories and organized independently from each other – the first extreme is e.g. embodied by traditional library collections, while the most prominent example for the latter is the World Wide Web. In the following, both extremes and every possible specification between them shall be understood as information spaces.

We begin our consideration with a relatively simple organized information space. Figure 1.1 shows a situation that is remindful of a bibliographic database. The document store contains a number of bibliographic records, which are representing two monographs written by the German chemist and Nobel Prize laureate Otto Hahn and one book of correspondence from the physicist Lise Meitner to Otto Hahn. To represent the authorship of Otto Hahn and Lise Meitner for each document consistently, a name authority file is used, which contains personal name authority records of both scientists that can be linked to the stored documents. In doing so, one can easily search the information space e.g. for all documents written by Otto Hahn – this search operation is often referred to as a collocation search.

Fig. 1.1: Authority files in information spaces.

Another search operation can be described as a subject search. That would be a search e.g. for all documents about “radioactivity”. To carry out subject searches, the information space must somehow provide the information of what each document is “about” – in the indexing context we also speak of the aboutness of a document (cf. Ingwersen 1992, 50–54). In bibliographic databases this aboutness is traditionally represented by one or more subject headings or thesaurus descriptors. In order to provide a consistent representation, the subject headings can be organized in a subject headings authority file, so that each subject heading has its own authority record that can be linked to the appropriate document records (cf. Fig. 1.1).

There is nothing special to the situation described so far and everybody who has ever used an online catalog of a library should be familiar with it, as it corresponds to the way bibliographic data has been organized for a long time and still continues to be organized by documentary institutions and especially libraries. However, knowledge representation is beginning beyond this situation.

In Figure 1.2 the authority files are replaced by a network-like structure. The now grey shaded elements of Figure 1.1 seem to become more complex, as they are somehow embedded in a meaningful context – later on in this book, we will address these elements precisely and speak more abstractly of entities of a knowledge representation. What we are characterizing here rather vague as a “meaningful context” raises these entities from the keyword-based level in Figure 1.1 to a conceptual level in Figure 1.2. We will examine this important step in the following chapters and confine ourselves here to the determination that these concepts primarily can be used for indexing the stored documents and thereby fulfill the same basic descriptor function as simple keywords, but that they also open up a broader context, as they are connected to other, somehow related concepts. In the following, this situation will be referred to as a knowledge structure.

Fig. 1.2: Knowledge structures in information spaces.

Searching the information space in Figure 1.2 with a descriptor “radioactivity” leads not only to the indexed monograph of Otto Hahn “Applied radiochemistry”, but also to the related descriptors “activity level” and “radioisotope”. It becomes apparent that an information seeker, who is interested in “radioactivity”, could also be interested in certain levels of radioactivity or in concrete radioactive isotopes. The same seems to apply to “nuclear fission” and “nuclear reaction” – it isn’t unlikely that an information seeker with an interest in nuclear fission may also be interested in other nuclear reactions. Beyond that, the knowledge structure of Figure 1.2 also establishes a relationship between Otto Hahn and the rather abstract concept “person” explicit, as well as between Otto Hahn’s research colleague Lise Meitner and “person”. As a human there’s no difficulty in the cognitive interpretation of these relations – we can easily see that Otto Hahn and Lise Meitner are persons, even if we never heard their names before. By using semantic technologies, this knowledge can be made machine-readable, so that it would be able to infer (Glossary C3.2) that Otto Hahn is a person due to the fact that the concept “Hahn, Otto” is related to the concept “person” in a specific way. Likewise the risk of confusing the person Otto Hahn with the homonymous research vessel, which was launched in 1964 and named after the famous scientist, could be avoided.

At this point we have already mentioned many aspects and reached to the core issues of this book. In the following, we will take a closer look at searches in information spaces and the underlying information retrieval processes and therefore give a first impression of the usefulness of relations like the above described. We will also look at the interdependency between indexing and information retrieval processes, introduce Knowledge Organization Systems (KOSs) as types of knowledge structures that are designed to support indexing and retrieval and finally concern questions like how it could be made explicit and recognizable for a KOS that a document “Letters of Lise Meitner to Otto Hahn” is about letters that Lise Meitner wrote to Otto Hahn and not vice versa.

Based on this, we will provide a more systematic discussion of the specific types of relations and their functionality within and between knowledge structures – later on we will speak of them as intra- and intersystem relations. Yet, before that, some preliminary considerations will be provided, in order to facilitate a better understanding of the mentioned issues.

Accordingly, we will address the functionality of intersystem relations, i.e., those relations that are bridging two knowledge structures and therefore make them somehow interoperable. In this context, we will focus on the problems of heterogeneity that may arise e.g. from the use of different knowledge structures for indexing purposes. This is denoted in Figure 1.3, where single concepts of our introduced example knowledge structure are linked to other, really existing structures, namely the Library of Congress Subject Headings (LCSH), the International Nuclear Information System / Energy Technology Data Exchange (INIS/ETDE), and the YAGO project.

dp n="19" folio="5" ?

Fig. 1.3: Interoperability in information spaces.

These three structures, which were arbitrary selected for this example, are quite different in their organization, coverage and purpose. The LCSH can be characterized as an authority file, INIS/ETDE is a thesaurus that has been developed and used by the International Atomic Energy Agency (IAEA)¹, and YAGO is an ontology mainly built up with vocabulary from the Wikipedia². Since we haven’t introduced the thesaurus and the ontology as two essential types of knowledge representation yet, we won’t stress the differences between these structures here and now. Instead, we simply assert that concepts of one structure can also be part of another structure, as denot...

Title Page
Copyright Page
Preface
Table of Contents
Table of Figures
List of Tables
1 Introduction: Envisioning Semantic Information Spaces
Part A - Propaedeutics – Organizing, Representing, and Exploring Knowledge
Part B - Status quo – Handling Heterogeneity in Indexing and Retrieval
Part C - Vision – Ontology-based Indexing and Retrieval
Part D - Appendices
Abbreviations
References
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Semantic Knowledge Representation for Information Retrieval by Winfried Gödert,Jessica Hubrich,Matthias Nagelschmidt in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

About this book

Trusted by 375,005 students

Information

1 Introduction: Envisioning Semantic Information Spaces

Table of contents

Frequently asked questions