eBook - ePub

Digital Libraries

Name: Digital Libraries
ISBN: 9781134730759

Philosophies, Technical Design Considerations, and Example Scenarios

David Stern,

246 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Digital Libraries

Philosophies, Technical Design Considerations, and Example Scenarios

David Stern,

About this book

An unparalleled overview of current design considerations for your digital library! Digital Libraries: Philosophies, Technical Design Considerations, and Example Scenarios is a balanced overview of public services, collection development, administration, and systems support, for digital libraries, with advice on adopting the latest technologies that appear on the scene. As a professional in the library and information science field, you will benefit from this special issue that serves as an overview of selected directions, trends, possibilities, limitations, enhancements, design principals, and ongoing projects for integrated library and information systems. In Digital Libraries, you will discover the latest ideas and research on digitizing and distributing online library material, including information on:

organization and administration of new digital library facilities
collection development in digital libraries
technological infrastructures for seamlessly integrated computer databases over the Internet
XML and other new standards for displaying data on the web
interface design issues in the search environment
object oriented interfaces and improved searching possibilities
a brief history of patents on the internetDigital Libraries is a working reference for your digital library-specific problems. Split into three related sections: Philosophies, Technical Design Considerations, and Example Scenarios, Digital Libraries addresses the many complexities and new issues that have evolved with the development of digital libraries and their future technologies. You will gain a thorough understanding of the public service and design considerations that are necessary to take your digital library into the 21st century.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Digital Libraries by David Stern in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Library & Information Science. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2014

eBook ISBN

9781134730759

Topic

Languages & Linguistics

Subtopic

Library & Information Science

Index

Languages & Linguistics

Technical Design Considerations

University of Illinois the Federation of Digital Libraries: Interoperability Among Heterogeneous Information Systems

Robert Ferrer

Robert Ferrer is a Research Programmer with the Digital Library Initiative, University of Illinois, Champaign-Urbana, IL.

SUMMARY. This paper briefly reviews some of the trends and issues now challenging today’s digital library that need to be addressed in successfully federating multiple heterogeneous databases. Trends include the development of loosely-coupled federations that combine object technology with client-server architecture. Issues include those dealing with heterogeneity, especially the problems associated with schema translation and integration. The paper also addresses the benefits of SGML in facilitating the searching of full-text documents with a high degree of precision. [Article copies available for a fee from The Haworth Document Delivery Service: 1-800-342-9678. E-mail address: getinfo@haworthpressinc. com]

KEYWORDS: Federation of digital libraries, interoperability, trends and issues, multiple heterogeneous databases, loosely-coupled federations, object technology, client-server architecture, schema translation and integration, SGML, full-text documents

Introduction

The information needs of today’s library patron is complex. Users often require a multitude of information from many different sources. Nowhere is this more evident than in today’s science and technology libraries. These libraries simply cannot keep up with the plethora of publications that emerge each year from new sources as diverse as the disciplines that comprise science and technology. The cost of subscribing and housing even current publications becomes prohibitive. Libraries are faced with the daunting task of meeting the ever-increasing information needs of the patron while remaining fiscally responsible, especially under diminishing budgets.

Downsizing has become a popular business trend. For many companies this has resulted in the decentralization of business units and a leaner organizational structure. The same can be said about today’s libraries. Collection Development is shifting away from a philosophy of acquisition to one of access. The decentralization of business units has increased the establishment of remotely located information warehouses independently organized in dissimilar environments. They are faced with the challenge of connecting and coordinating the “islands of the archipelago” not only to be able to share information but also to provide control in the areas of data integrity and security. Libraries too are faced with the challenge of being able to interconnect different computers and software in order to access the information needed by patrons. This paper briefly reviews some of the trends and issues now challenging today’s digital library that need to be addressed in successfully federating multiple heterogeneous databases.

Mainframe Architecture

Much has changed since the mainframe architecture in which the original information retrieval services, such as Dialog and BRS, maintained their data. Dumb terminals connected to a large high-end computer allowed users to enter commands and display output one line at a time (Figure 1). The mainframe handled all other aspects of a search session. A centralized server managed communications, user query interaction, database management and data presentation. The data from different sources had to be converted into a single homogeneous structure and organization.

Client-Server Architecture

FIGURE 1. The mainframe computer manages all aspects of a session including communications, user query interaction, database management and data presentation.

The development of client-server architecture is the major enabling technology behind distributed computing and the driving force towards federating heterogeneous networked databases.⁵’⁸ Unlike mainframe architecture, workload is balanced between the client and server (Figure 2). The client handles the user interface-query formulation and results presentation. User-friendly graphics-based client software offers a consistent interface regardless of the underlying support structure. Today’s PC-based client can also manage multiple tasks, such as maintaining simultaneous connections to a variety of sources. The result is transparent access to information sources regardless of location. The server handles the database management tasks and the processing of requests.

FIGURE 2. Workload is more balanced in client-server architecture. Middleware software standardizes communications protocol allowing clients and servers to interoperate regardless of the different hardware and DBMS software involved.

Middleware

Connecting the client to the server is a class of software known as middleware. The software components that comprise middleware physically reside on both the client and server. Middleware insures that the client can communicate or interoperate with the server regardless of the different hardware and software involved. Clients and servers generally communicate by using a standardized sequence of messages known as a protocol. A message can be a request from the client for the server to perform an operation, such as search a remote database. A message from the server can be a response to the client, such as the results of a search. The Application Programming Interface (API) is the middleware component that facilitates the transferring of messages between the client and server based on a protocol.

The API protocol defines a set of messages that both the client and server understand (Figure 3). The client’s API translates a message into a form independent of either the client or server that permits travel across the network. The server’s API receives the message and translates it into a form that the server understands. The message is processed, and a response is sent to the client via the server’s API. The client’s API transforms the response into a form that the client understands, and thus completes the transaction.³

FIGURE 3. The Application Programming Interface (API) is the middleware component that facilitates the transfer of messages across the network. The client query is converted into a standardized form that the server’s API can translate and submit to the DBMS. Results are likewise converted into a form that the client’s API can present to the client program.

Depending on how the API is implemented, messages can be sent either synchronously or a synchronously (Figure 4). Synchronous models are generally session-oriented. A connection is established and maintained for the duration of a session. A telephone call is based on a synchronous communications model where a connection is maintained for the duration of a conversation. For client-server, a session can be the exchange of a sequence of requests and responses in order to accomplish a specific task. The advantage of this is that the session is stateful. The client and server can use what has previously transpired in the messages that are exchanged and operations that are performed. This is particularly useful in an iterative activity such as searching.

The problem is that the connection is maintained even while the server is waiting for a request from the client, or vice-versa. A connection ties up the resources of both the client and server. Neither the client nor the server can do other activities while in a session, even during periods of inactivity. This may be viewed as an advantage since it insures that a sequence of messages are exchanged in proper order. Nevertheless it is inefficient, and impacts on the scalability of the system. The Remote Procedure Call (RPC) is a synchronous communications implementation common to client-server architecture where operations are executed on remote servers but appear to the client as local functions. Typical of synchronous models, the client is blocked from doing other work until the operation has completed execution and a response is received.

FIGURE 4. In synchronous communications a connection is maintained for the duration of the session. The client is busy until the connection is terminated. In asynchronous communications the client is free to do other tasks as soon as the current task is initiated. However, there is no guarantee that the tasks will complete in the order initiated.

Alternatively, asynchronous models are not session-oriented. A message is sent but the sender does not wait for a response before doing other work. The recipient is not sitting idle waiting for a message. This is similar to sending messages via the postal system, or electronically via E-mail. Multitasking in a modern operating system involves the execution of a task without waiting for the task to complete before initiating the execution of another task. The HTTP protocol used to access information on the World Wide Web is based on an asynchronous communications model. The client establishes a connection with a server in sending a message, such as a request for information. Once the message is sent, the client is free to send messages to other servers. Once the server receives the message and sends the requested information, it closes the connection. If the client wants to send additional messages to the same server, it will have to reestablish a connection with the server. For each message a connection is established and terminated.

Resources are tied up only when needed. During periods of inactivity while waiting for messages, the client and server are free to do other activities. The server, for example, is free to service other requests. This contributes to the system’s scalability. However, since a new connection must be established with each message, state information is not maintained. The server does not remember anything about the previous message. The stateless nature of the asynchronous communications model limits the level of sophistication possible in iterative activities, such as searching. Current research is focused on how to implement stateful asynchronous models.

Regardless of implementation, because the API is well defined for both the client and server, it provides unparalleled flexibility in choosing hardware and software components for both the client and server. As long as the client and server use the same API, their components can be independently upgraded or changed without critically impacting each other. Furthermore, the role of the API as represented in middleware software plays an increasingly strategic role in the evolution of network computing that enables interoperability among heterogeneous systems.

Z39.50

The Z39.50 protocol is an example of a synchronous session-oriented implementation of the client-server architecture specifically designed for database searching and information retrieval on systems that run on different hardware and software. Developed in conjunction with the Library of Congress and bibliographic service providers, the Z39.50 protocol allows databases from different information providers to interoperate. Z39.50 currently supports the search and retrieval of bibliographic records primarily in the MARC format.⁷’²⁰

The Z39.50 standard specifies the messages that are exchanged in a session, the structure and semantics of a search query, and how results are returned to the user. The search query is hierarchically structured into sets permitting the inclusion of Boolean operators to connect the sets (Figure 5). Each set includes a search term or phrase, and any parameters that specify which attributes of the record to search, such as author, title or descriptor fields.

Currently the standards assume that the client and server communicate over a stateful connection on the Internet. The dialogue that is created by the exchange of messages between the client and server is called a Z39.50 Associtation. Messages are a sequence of commands and responses. The protocol establishes a set pattern for dialoguing. For each command sent by the client, the server responds with an acknowledgment or the requested information (Figure 6).

FIGURE 5. The Z39.50 query structure is hierarchically arranged into sets. Sets are connected by Boolean operators.

A searcher submits a query using the client interface. The middleware software module on the client is called the Origin. It translates the query into a standardized form specified by the Z39.50 protocol. The Origin initiates a session with the INIT command. The software module on the intended server is called the Target. The Target sends an acknowledgment that the session has started. The Origin sends the query via the SEARCH command. The Target interfaces with the desired database on a remote system. It responds to the query and sends the results of the query back to the Origin. The Origin interfaces with the searcher. The searcher informs the Origin which records to view. The Origin sends a PRESENT command with the specific records to retrieve. Finally, a TERMINATION command is sent to end the session.

Newer versions of the protocol are expected to better support the searching of databases with more diverse record structures. The EXPLAIN command will allow the client to ask the server to describe the contents, the record structure and supported attributes of the database it serves. A full-text document delivery service will also be provided.

Nevertheless, there are limitations to Z39.50. It cannot adequately handle searching full-text documents. Full-text documents often provide a rich and complex set of access point...

Cover
Half Title
Copyright
Title
Contents
Introduction
PHILOSOPHIES
TECHNICAL DESIGN CONSIDERATIONS
EXAMPLE SCENARIOS
Index