eBook - ePub

Digital Libraries

Name: Digital Libraries
ISBN: 9780081004869

Fabrice Papy,

152 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Digital Libraries

Fabrice Papy,

About this book

The technological interoperability of digital libraries must be rethought in order to adapt to new uses and networks. Informative digital environments aimed at responding to heritage, cultural, scientific or commercial demands have taken over the global cyberspace and have redesigned the techno-informative landscape of the Web. However, while the technological models demonstrate their effectiveness and explain to a large extent the creation of digital libraries, archives and deposits, the subjacent concept of uses continues to cause debate. The information technologies used by heterogeneous digital libraries enable a technical interoperability of content. This is not enough to allow the adhesion of a public connected to very different information profiles and techniques. This book explores the avenues of a user-orientated interoperability where the questions of consultation interfaces and content description processes are studied. - Discusses Metadata as a resource for linking - Provides a practical approach - A valuable resource for anyone involved in digital library developments and digital collections and services

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

ISTE Press - Elsevier

Year

2016

Print ISBN

9781785480454

eBook ISBN

9780081004869

Topic

Languages & Linguistics

Subtopic

Computer Science General

Index

Languages & Linguistics

Information Retrieval, Web and Interoperability

Abstract

Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms1, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].

Keywords

Computer science

Digital use

HTTP

Information retrieval

Interoperability

IR specificity

IS specificity

Repositories

Technical documentation

WordPress data model

1.1 Information retrieval: from theory to practice

Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms¹, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].

A survey conducted in 2008² among 2,218 PhD students in Bretagne on their training needs in mastering scientific information revealed that 96% of the respondents used search engines³ (73% very often and 23% regularly) as resources, while 53% used specialist portals, which came in second position.

Due to their algorithmic simplicity, sustained by easily modulable software architectures (clustering and cloud computing), these search tools were able to adapt to document format versatility and absorb the exponential increase in queries⁴. This technological performance has for a long time contributed to search engines being perceived as state-of-the-art retrieval systems. From a simplified IR perspective, the situation may have stayed the same, had it not been for the social turning point (Web 2.0) that took hold of the Web, and thus disrupted the spectacular technological progression in the accessibility of digital documents, by reintroducing variations in the information and digital practices.

While the conceptual approach of IR [DEN 03, MAN 02, LEF 00] is based on a systematic and methodical model that is easy to understand (Figure 1.1), when put into practice it takes multiple forms depending on the primary or secondary resources used, and above all on the technical means required (indexing and classification languages, query languages, technical tools, etc.) [TAS 14, ZOU 13, PIR 10, REP 11]. Within a short time, the multiple technological vectors in the field of document retrieval applied to natively digital information have generated a novel complexity that required employment of retrieval strategies based on precise knowledge of the technical functioning of IR systems, irrespective of their nature: “It is already true that technological devices propose retrieval solutions that the average person ignores. Surprisingly, it is often a convergence of such solutions that should be used for qualitative information” [MOE 98, p. 67].

f01-01-9781785480454 — Figure 1.1 The information retrieval loop (inspired by Bernard Pochet, 2008, “Knowing how to search and query” under Creative Commons License). The entry point to the figure is “the initial question” which implies a cognitive approach to information literacy, with a personal and implicit covering. In compensation, “information problem solving” aims at confronting this initial question to a first detailed explanation (without specializing it for a particular technical system) in words, terms or expressions to use (canonical or non-canonical form), excluded terms, terms to be mandatorily or optionally associated (Boolean operators). Above all, it allows the user to define the vocabulary available to describe his information expectations in various ways. It is an important stage during which the user will establish a lexical strategy for identifying generic/specific terms, either common or specialized, choice of synonyms or antonyms, etc.

Through the democratization of constantly evolving content production (articles, blog posts and comments) enriched with multimedia data [PAP 03a, PAP 99], the blogosphere⁵ has disrupted the technological order established by search engine engineering. These publishing systems have facilitated content creation driven by essentially editorial logic [ANG 11, DES 09, SOU 03a], eliminating the constraints of indispensible computer skills that made it accessible to experts and specialist communication agencies that were disconnected from the publishing/online writing techniques [PAP 14b, AMG 08, TAR 07].

1.2 Information sources and documentary resources

During the 2000s, the worldwide frenzy of social networks [BOU 10, REB 07] and the participative and collaborative approaches have further shaken up the IR as orchestrated by the main Web search engines. “We know, for example, that Facebook is currently made up of over one million (third party) developers distributed in more than 180 countries. Facebook today consists of over 550,000 applications developed by third parties, and more than 70% of the users interact with these applications” [TCH 11, p. 60].

The functioning of the blogosphere and digital social networks is characterized by permanent instability of contents and paratextual data⁶. Blog posts, virtual pages or social networks never reach a definitive form. The object may at any time evolve during content update (insertion, modification and deletion) performed by one or more authors, depending on the assigned access rights (Figure 1.2). Instability is equally generated by social interactions, which give rise to data that accompany the primary resource and enrich it with various facets, as soon as the information is available online. Comments, evaluations, ratings, as well as readers' identity, status and quality, date, etc., are all data that lend themselves to retrieval.

f01-02-9781785480454 — Figure 1.2 Partial view of the WordPress data model (version 3.8, extracted from https://codex.wordpress.org/). The relational tables wp_term_taxonomy, wp_links and wp_term_relationships host the semantics of links and posts stored in the blog's Content Management System. The posts are connected to categories and semantic markers (tags) within the wp_terms table. The semantic links (associations) are stored in the wp_term_relationships table

For example, the Facebook Graph Application Program Interface (API)⁷ is the primary mechanism for extracting data from the social platform. It is a low-level hypertext transfer protocol (HTTP) transaction-based API that can be used for data interrogation, adding data to user account, downloading all sorts of resources, etc. The Facebook data framework relies on an architecture based on nodes (things such as user, image, page, comments, etc.), edges (connections between nodes) and fields (user's date of birth, name of the page, most recent viewing date, etc.)⁸. Fields are extremely diverse and they significantly enrich nodes and edges with additional semantic data. Knowledge of these fields is, therefore, critical to reducing noise during IR (and enhancing query relevance), in particular when applied to huge sets of data⁹.

Permanent instability of these objects and copresence of content data and social relations-generated data make it difficult for general search engines to index these publishing areas. On the one hand, indexing operations are performed at a pace impossible to sustain by indexing robots and on the other hand, availability of blogs and social networks would be affected by constant requests sent by robots, to the detriment of human users¹⁰.

On top of these questions of automated indexing, there is reduced compatibility between these very generic mechanisms of blog functioning and the basic methods of search engines. Collaborative categorization, node (and connection) semantization and co-construction of fields and themes through cross-references between posts on the same site or platform (blogs being extremely simplified versions of the latter) are very badly (if at all) processed by search engine software engineering. This fact contributes to the trend of customized development in the field of IR. Due to a variety of additional scripts (plugins, add-ons and specific developments), each blog can indefinitely enrich its final information (primary data and resources) retrieval possibilities by refining them due to posts' structural data¹¹.

Websites, blogs and social networks thus determine contents that carry specificities translated as additional specific data that cannot be subject to similar query methods. The importance of the visibility of information systems (IS) structuring has long been known. In 1982, Y. Corson, rese...

Cover image
Title page
Table of Contents
Copyright page
1: Information Retrieval, Web and Interoperability
2: Interoperability, Interface and Hypertext
3: Usage-Oriented Interoperability Instruments
4: Usage Prospects
Bibliography
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Digital Libraries by Fabrice Papy in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Computer Science General. We have over 1.5 million books available in our catalogue for you to explore.

About this book

Trusted by 375,005 students

Information

1.1 Information retrieval: from theory to practice

1.2 Information sources and documentary resources

Table of contents

Frequently asked questions