The technological interoperability of digital libraries must be rethought in order to adapt to new uses and networks. Informative digital environments aimed at responding to heritage, cultural, scientific or commercial demands have taken over the global cyberspace and have redesigned the techno-informative landscape of the Web. However, while the technological models demonstrate their effectiveness and explain to a large extent the creation of digital libraries, archives and deposits, the subjacent concept of uses continues to cause debate.The information technologies used by heterogeneous digital libraries enable a technical interoperability of content. This is not enough to allow the adhesion of a public connected to very different information profiles and techniques. This book explores the avenues of a user-orientated interoperability where the questions of consultation interfaces and content description processes are studied.- Discusses Metadata as a resource for linking- Provides a practical approach- A valuable resource for anyone involved in digital library developments and digital collections and services
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Digital Libraries by Fabrice Papy in PDF and/or ePUB format, as well as other popular books in Sprachen & Linguistik & Informatik Allgemein. We have over one million books available in our catalogue for you to explore.
Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms1, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].
Keywords
Computer science
Digital use
HTTP
Information retrieval
Interoperability
IR specificity
IS specificity
Repositories
Technical documentation
WordPress data model
1.1 Information retrieval: from theory to practice
Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms1, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].
A survey conducted in 20082 among 2,218 PhD students in Bretagne on their training needs in mastering scientific information revealed that 96% of the respondents used search engines3 (73% very often and 23% regularly) as resources, while 53% used specialist portals, which came in second position.
Due to their algorithmic simplicity, sustained by easily modulable software architectures (clustering and cloud computing), these search tools were able to adapt to document format versatility and absorb the exponential increase in queries4. This technological performance has for a long time contributed to search engines being perceived as state-of-the-art retrieval systems. From a simplified IR perspective, the situation may have stayed the same, had it not been for the social turning point (Web 2.0) that took hold of the Web, and thus disrupted the spectacular technological progression in the accessibility of digital documents, by reintroducing variations in the information and digital practices.
While the conceptual approach of IR [DEN 03, MAN 02, LEF 00] is based on a systematic and methodical model that is easy to understand (Figure 1.1), when put into practice it takes multiple forms depending on the primary or secondary resources used, and above all on the technical means required (indexing and classification languages, query languages, technical tools, etc.) [TAS 14, ZOU 13, PIR 10, REP 11]. Within a short time, the multiple technological vectors in the field of document retrieval applied to natively digital information have generated a novel complexity that required employment of retrieval strategies based on precise knowledge of the technical functioning of IR systems, irrespective of their nature: “It is already true that technological devices propose retrieval solutions that the average person ignores. Surprisingly, it is often a convergence of such solutions that should be used for qualitative information” [MOE 98, p. 67].
Figure 1.1 The information retrieval loop (inspired by Bernard Pochet, 2008, “Knowing how to search and query” under Creative Commons License). The entry point to the figure is “the initial question” which implies a cognitive approach to information literacy, with a personal and implicit covering. In compensation, “information problem solving” aims at confronting this initial question to a first detailed explanation (without specializing it for a particular technical system) in words, terms or expressions to use (canonical or non-canonical form), excluded terms, terms to be mandatorily or optionally associated (Boolean operators). Above all, it allows the user to define the vocabulary available to describe his information expectations in various ways. It is an important stage during which the user will establish a lexical strategy for identifying generic/specific terms, either common or specialized, choice of synonyms or antonyms, etc.
Through the democratization of constantly evolving content production (articles, blog posts and comments) enriched with multimedia data [PAP 03a, PAP 99], the blogosphere5 has disrupted the technological order established by search engine engineering. These publishing systems have facilitated content creation driven by essentially editorial logic [ANG 11, DES 09, SOU 03a], eliminating the constraints of indispensible computer skills that made it accessible to experts and specialist communication agencies that were disconnected from the publishing/online writing techniques [PAP 14b, AMG 08, TAR 07].
1.2 Information sources and documentary resources
During the 2000s, the worldwide frenzy of social networks [BOU 10, REB 07] and the participative and collaborative approaches have further shaken up the IR as orchestrated by the main Web search engines. “We know, for example, that Facebook is currently made up of over one million (third party) developers distributed in more than 180 countries. Facebook today consists of over 550,000 applications developed by third parties, and more than 70% of the users interact with these applications” [TCH 11, p. 60].
The functioning of the blogosphere and digital social networks is characterized by permanent instability of contents and paratextual data6. Blog posts, virtual pages or social networks never reach a definitive form. The object may at any time evolve during content update (insertion, modification and deletion) performed by one or more authors, depending on the assigned access rights (Figure 1.2). Instability is equally generated by social interactions, which give rise to data that accompany the primary resource and enrich it with various facets, as soon as the information is available online. Comments, evaluations, ratings, as well as readers' identity, status and quality, date, etc., are all data that lend themselves to retrieval.
Figure 1.2 Partial view of the WordPress data model (version 3.8, extracted from https://codex.wordpress.org/). The relational tables wp_term_taxonomy, wp_links and wp_term_relationships host the semantics of links and posts stored in the blog's Content Management System. The posts are connected to categories and semantic markers (tags) within the wp_terms table. The semantic links (associations) are stored in the wp_term_relationships table
For example, the Facebook Graph Application Program Interface (API)7 is the primary mechanism for extracting data from the social platform. It is a low-level hypertext transfer protocol (HTTP) transaction-based API that can be used for data interrogation, adding data to user account, downloading all sorts of resources, etc. The Facebook data framework relies on an architecture based on nodes (things such as user, image, page, comments, etc.), edges (connections between nodes) and fields (user's date of birth, name of the page, most recent viewing date, etc.)8. Fields are extremely diverse and they significantly enrich nodes and edges with additional semantic data. Knowledge of these fields is, therefore, critical to reducing noise during IR (and enhancing query relevance), in particular when applied to huge sets of data9.
Permanent instability of these objects and copresence of content data and social relations-generated data make it difficult for general search engines to index these publishing areas. On the one hand, indexing operations are performed at a pace impossible to sustain by indexing robots and on the other hand, availability of blogs and social networks would be affected by constant requests sent by robots, to the detriment of human users10.
On top of these questions of automated indexing, there is reduced compatibility between these very generic mechanisms of blog functioning and the basic methods of search engines. Collaborative categorization, node (and connection) semantization and co-construction of fields and themes through cross-references between posts on the same site or platform (blogs being extremely simplified versions of the latter) are very badly (if at all) processed by search engine software engineering. This fact contributes to the trend of customized development in the field of IR. Due to a variety of additional scripts (plugins, add-ons and specific developments), each blog can indefinitely enrich its final information (primary data and resources) retrieval possibilities by refining them due to posts' structural data11.
Websites, blogs and social networks thus determine contents that carry specificities translated as additional specific data that cannot be subject to similar query methods. The importance of the visibility of information systems (IS) structuring has long been known. In 1982, Y. Corson, rese...
Table of contents
Cover image
Title page
Table of Contents
Copyright page
1: Information Retrieval, Web and Interoperability