eBook - ePub

Scaling Apache Solr

Name: Scaling Apache Solr
ISBN: 9781783981748

Hrishikesh Vijay Karambelkar,

298 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Scaling Apache Solr

Hrishikesh Vijay Karambelkar,

About this book

In Detail

This book is for individuals who want to build high-performance, scalable, enterprise-ready search engines for their customers/organizations. The book starts with the basics of Apache Solr, covering different ways to analyze enterprise information and design enterprise-ready search engines using Solr. It also discusses scaling Solr-based enterprise search for the next level.

Each chapter takes you through more advanced levels of Apache Solr with real-world practical details such as configuring instances, installing and setting up instances, and more. This book contains detailed explanations of the basic and advanced features of Apache Solr.

By sequentially working through the steps in each chapter and with the help of real-life industry examples, you will quickly master the features of Apache Solr to build search solutions for enterprises.

Approach

This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies.

Who this book is for

If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2014

Edition

eBook ISBN

9781783981748

Topic

Computer Science

Subtopic

Data Warehousing

Index

Computer Science

Scaling Apache Solr

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Understanding Apache Solr

Challenges in enterprise search

Apache Solr – an overview

Features of Apache Solr

Solr for end users

Powerful full text search

Search through rich information

Results ranking, pagination, and sorting

Facets for better browsing experience

Advanced search capabilities

Administration

Apache Solr architecture

Storage

Solr application

Integration

Client APIs and SolrJ client

Other interfaces

Practical use cases for Apache Solr

Enterprise search for a job search agency

Problem statement

Approach

Enterprise search for energy industry

Problem statement

Approach

Summary

2. Getting Started with Apache Solr

Setting up Apache Solr

Prerequisites

Running Solr on Jetty

Running Solr on Tomcat

Solr administration

What's next?

Common problems and solution

Understanding the Solr structure

The Solr home directory structure

Solr navigation

Configuring the Apache Solr for enterprise

Defining a Solr schema

Solr fields

Dynamic Fields in Solr

Copying the fields

Field types

Other important elements in the Solr schema

Configuring Solr parameters

solr.xml and Solr core

solrconfig.xml

The Solr plugin

Other configurations

Understanding SolrJ

Summary

3. Analyzing Data with Apache Solr

Understanding enterprise data

Categorizing by characteristics

Categorizing by access pattern

Categorizing by data formats

Loading data using native handlers

Quick and simple data loading – post tool

Working with JSON, XML, and CSV

Handling JSON data

Working with CSV data

Working with XML data

Working with rich documents

Understanding Apache Tika

Using Solr Cell (ExtractingRequestHandler)

Adding metadata to your rich documents

Importing structured data from the database

Configuring the data source

Importing data in Solr

Full import

Delta import

Loading RDBMS tables in Solr

Advanced topics with Solr

Deduplication

Extracting information from scanned documents

Searching through images using LIRE

Summary

4. Designing Enterprise Search

Designing aspects for enterprise search

Identifying requirements

Matching user expectations through relevance

Access to searched entities and user interface

Improving search performance and ensuring instance scalability

Working with applications through federated search

Other differentiators – mobiles, linguistic search, and security

Enterprise search data-processing patterns

Standalone search engine server

Distributed enterprise search pattern

The replicated enterprise search pattern

Distributed and replicated

Data integrating pattern for search

Data import by enterprise search

Applications pushing data

Middleware-based integration

Case study – designing an enterprise knowledge repository search for software IT services

Gathering requirements

Designing the solution

Designing the schema

Integrating subsystems with Apache Solr

Working on end user interface

Summary

5. Integrating Apache Solr

Empowering the Java Enterprise application with Solr search

Embedding Apache Solr as a module (web application) in an enterprise application

How to do it?

Apache Solr in your web application

How to do it?

Integration with client technologies

Integrating Apache Solr with PHP for web portals

Interacting directly with Solr

Using the Solr PHP client

How to do it?

Advanced integration with Solarium

How to do it?

Integrating Apache Solr with JavaScript

Using simple XMLHTTPRequest

Integrating Apache Solr using AJAX Solr

Parsing Solr XML with the help of XSLT

Case study – Apache Solr and Drupal

How to do it?

Summary

6. Distributed Search Using Apache Solr

Need for distributed search

Distributed search architecture

Apache Solr and distributed search

Understanding SolrCloud

Why Zookeeper?

SolrCloud architecture

Building enterprise distributed search using SolrCloud

Setting up a SolrCloud for development

Setting up a SolrCloud for production

Adding a document to SolrCloud

Creating shards, collections, and replicas in SolrCloud

Common problems and resolutions

Case study – distributed enterprise search server for the software industry

Summary

7. Scaling Solr through Sharding, Fault Tolerance, and Integration

Enabling search result clustering with Carrot2

Why Carrot2?

Enabling Carrot2-based document clustering

Understanding Carrot2 result clustering

Viewing Solr results in the Carrot2 workbench

FAQs and problems

Sharding and fault tolerance

Document routing and sharding

Shard splitting

Load balancing and fault tolerance in SolrCloud

Searching Solr documents in near real time

Strategies for near real-time search in Apache Solr

Explicit call to commit from a client

solrconfig.xml – autocommit

CommitWithin – delegating the responsibility to Solr

Real-time search in Apache Solr

Solr with MongoDB

Understanding MongoDB

Installing MongoDB

Creating Solr indexes from MongoDB

Scaling Solr through Storm

Getting along with Apache Storm

Solr and Apache Storm

Summary

8. Scaling Solr through High Performance

Monitoring performance of Apache Solr

What should be monitored?

Hardware and operating system

Java virtual machine

Apache Solr search runtime

Apache Solr indexing time

SolrCloud

Tools for monitoring Solr performance

Solr administration user interface

JConsole

SolrMeter

Tuning Solr JVM and container

Deciding heap size

How can we optimize JVM?

Optimizing JVM container

Optimizing Solr schema and indexing

Stored fields

Indexed fields and field lengths

Copy fields and dynamic fields

Fields for ra...

Scaling Apache Solr

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Scaling Apache Solr an online PDF/ePUB?

Yes, you can access Scaling Apache Solr by Hrishikesh Vijay Karambelkar in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Warehousing. We have over 1.5 million books available in our catalogue for you to explore.

Scaling Apache Solr

Scaling Apache Solr

About this book

In Detail

Approach

Who this book is for

Trusted by 375,005 students

Information

Scaling Apache Solr

Table of Contents

Table of contents

Frequently asked questions