
- 114 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
About this book
The WWW era made billions of people dramatically dependent on the progress of data technologies, out of which Internet search and Big Data are arguably the most notable. Structured Search paradigm connects them via a fundamental concept of key-objects evolving out of keywords as the units of search. The key-object data model and KeySQL revamp the data independence principle making it applicable for Big Data and complement NoSQL with full-blown structured querying functionality. The ultimate goal is extracting Big Information from the Big Data.
As a Big Data Consultant, Mikhail Gilula combines academic background with 20 years of industry experience in the database and data warehousing technologies working as a Sr. Data Architect for Teradata, Alcatel-Lucent, and PayPal, among others. He has authored three books, including The Set Model for Database and Information Systems and holds four US Patents in Structured Search and Data Integration.
- Conceptualizes structured search as a technology for querying multiple data sources in an independent and scalable manner.
- Explains how NoSQL and KeySQL complement each other and serve different needs with respect to big data
- Shows the place of structured search in the internet evolution and describes its implementations including the real-time structured internet search
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Structured Search for Big Data by Mikhail Gilula in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.
Information
Chapter 1
Introduction to Structured Search
Abstract
This chapter compares side-by-side the features of the keyword search or information retrieval and the database search. The structured search is conceptualized as a technology for querying multiple data sources in an independent and scalable manner. It occupies the middle ground between keyword search and database search. As in the keyword search paradigm, query originators do not need to know the structure or the number of data sources being queried. As in the database paradigm, users can pose precise queries, control the output order, access data in real time, and manage the data security.
Keywords
keyword search
information retrieval
e-commerce
data security
query independence
query scalability
It is contrary to reason to say that there is a vacuum or space in which there is absolutely nothing.
Rene Descartes (Principia Philosophiae, 1644)
1.1. Limitations of Keyword Search
Contemporary search engines operate within the information retrieval (IR) paradigm where the search criteria consist of keywords and the search results are lists of web pages or, generally, lists of documents (texts), which include the specified combinations of keywords.
IR existed in different forms long before the introduction of computers and its limitations motivated the query concept research resulting in database languages like SQL. For example, in 1960s and 1970s it was popular to talk about the “factographic” systems, which would enable searching for information or facts per se as opposed to searching for documents, such as books, patents, or articles, that may or may not contain the relevant information. The new IR incarnation came with the Internet and was advanced by the Internet search providers.
The main limitations of the keyword search are as follows.
Intrinsic search imprecision. By using only keywords, it is generally difficult to determine the real question existing in the mind of the query originator because the same keywords may be used to pose different questions. Also, when trying to narrow down the search by adding more keywords, there is a greater risk of not finding the relevant information.
Search results only for humans. Since the results of the keyword search are typically the documents conveying information in natural languages, it is not easy to process the search results programmatically – not involving the human recipient. Of course, the web pages are always somewhat structured and sometimes consist of quite structured information, but the structure of each individual page is not known a priori and the difficulties of processing natural languages programmatically always remain.
No user control over output order. The ordering of search results is controlled by search engines and is a valuable trade secret. Some e-commerce websites allow users to sort search results by the price of merchandise. However, since the results are produced using keywords, the users often need to look through most of the returned items anyway. For example, currently when a user of a big Internet marketplace specifies a model of a digital camera to search for, and chooses the “Price: lowest first” option, the first couple of hundred items in the output are not the listings of the camera but instead are the camera accessories because they tend to be cheaper.
No security control. To index a document or a web page, search engines need full access to the source. In this context, security of information or parts of information has no place or meaning.
No real-time access. Processing web pages and updating indexes takes time. It could be days or weeks before the updated web pages would appear in search results. Information can become stale or completely disappear during this period.
Search engines are not green. Due to keyword search imprecision, most information returned by search engines is never viewed or consumed by users. This means excessive CPU and IO cycles, network traffic, and watts of energy are wasted in data centers.
1.2. Keyword Search in E-Commerce
One of the areas underserved by the keyword search is e-commerce. For example, there is no general way to search for all digital cameras with optical zoom more than 10, more than 10 megapixels, weighing less than 10 oz, and so on. The basic problems of locating merchandise using the keyword search are as follows.
Inability of finding merchandise directly by specifications rather than by keywords like brand or model needed to retrieve product specifications. Research of complex items may take hours and still does not guarantee the best deals. It would be vastly more efficient to search by multiple item characteristics at once instead of going back and forth through dozens or hundreds of descriptions in order to compare them by several parameters.
The search output rankings are generally unrelated to the qualities of merchandise (i.e., specifications) or the deals offered. Since the search results tend to be voluminous, high search ranks are critical for merchants. The keyword search puts buyers at a disadvantage because they are only able to look through the first few pages of an output, and whereby a better deal may be on the next page that they did not get to.
To alleviate these problems, e-merchants use the following main techniques.
• Impr...
Table of contents
- Cover
- Title page
- Table of Contents
- Copyright
- Dedication
- Quotation
- Preface
- Acknowledgments
- Chapter 1: Introduction to Structured Search
- Chapter 2: Key-Objects vs. Keywords
- Chapter 3: Key-Object Data Model
- Chapter 4: Structured Search Framework
- Chapter 5: Introduction to KeySQL
- Chapter 6: Structured Search on Database Landscape
- Chapter 7: Structured Search Solutions