Scaling Apache Solr
eBook - ePub

Scaling Apache Solr

  1. 298 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Scaling Apache Solr

About this book

In Detail

This book is for individuals who want to build high-performance, scalable, enterprise-ready search engines for their customers/organizations. The book starts with the basics of Apache Solr, covering different ways to analyze enterprise information and design enterprise-ready search engines using Solr. It also discusses scaling Solr-based enterprise search for the next level.

Each chapter takes you through more advanced levels of Apache Solr with real-world practical details such as configuring instances, installing and setting up instances, and more. This book contains detailed explanations of the basic and advanced features of Apache Solr.

By sequentially working through the steps in each chapter and with the help of real-life industry examples, you will quickly master the features of Apache Solr to build search solutions for enterprises.

Approach

This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies.

Who this book is for

If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weโ€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere โ€” even offline. Perfect for commutes or when youโ€™re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Scaling Apache Solr by Hrishikesh Vijay Karambelkar in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Warehousing. We have over one million books available in our catalogue for you to explore.

Information

Scaling Apache Solr


Table of Contents

Scaling Apache Solr
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding Apache Solr
Challenges in enterprise search
Apache Solr โ€“ an overview
Features of Apache Solr
Solr for end users
Powerful full text search
Search through rich information
Results ranking, pagination, and sorting
Facets for better browsing experience
Advanced search capabilities
Administration
Apache Solr architecture
Storage
Solr application
Integration
Client APIs and SolrJ client
Other interfaces
Practical use cases for Apache Solr
Enterprise search for a job search agency
Problem statement
Approach
Enterprise search for energy industry
Problem statement
Approach
Summary
2. Getting Started with Apache Solr
Setting up Apache Solr
Prerequisites
Running Solr on Jetty
Running Solr on Tomcat
Solr administration
What's next?
Common problems and solution
Understanding the Solr structure
The Solr home directory structure
Solr navigation
Configuring the Apache Solr for enterprise
Defining a Solr schema
Solr fields
Dynamic Fields in Solr
Copying the fields
Field types
Other important elements in the Solr schema
Configuring Solr parameters
solr.xml and Solr core
solrconfig.xml
The Solr plugin
Other configurations
Understanding SolrJ
Summary
3. Analyzing Data with Apache Solr
Understanding enterprise data
Categorizing by characteristics
Categorizing by access pattern
Categorizing by data formats
Loading data using native handlers
Quick and simple data loading โ€“ post tool
Working with JSON, XML, and CSV
Handling JSON data
Working with CSV data
Working with XML data
Working with rich documents
Understanding Apache Tika
Using Solr Cell (ExtractingRequestHandler)
Adding metadata to your rich documents
Importing structured data from the database
Configuring the data source
Importing data in Solr
Full import
Delta import
Loading RDBMS tables in Solr
Advanced topics with Solr
Deduplication
Extracting information from scanned documents
Searching through images using LIRE
Summary
4. Designing Enterprise Search
Designing aspects for enterprise search
Identifying requirements
Matching user expectations through relevance
Access to searched entities and user interface
Improving search performance and ensuring instance scalability
Working with applications through federated search
Other differentiators โ€“ mobiles, linguistic search, and security
Enterprise search data-processing patterns
Standalone search engine server
Distributed enterprise search pattern
The replicated enterprise search pattern
Distributed and replicated
Data integrating pattern for search
Data import by enterprise search
Applications pushing data
Middleware-based integration
Case study โ€“ designing an enterprise knowledge repository search for software IT services
Gathering requirements
Designing the solution
Designing the schema
Integrating subsystems with Apache Solr
Working on end user interface
Summary
5. Integrating Apache Solr
Empowering the Java Enterprise application with Solr search
Embedding Apache Solr as a module (web application) in an enterprise application
How to do it?
Apache Solr in your web application
How to do it?
Integration with client technologies
Integrating Apache Solr with PHP for web portals
Interacting directly with Solr
Using the Solr PHP client
How to do it?
Advanced integration with Solarium
How to do it?
Integrating Apache Solr with JavaScript
Using simple XMLHTTPRequest
Integrating Apache Solr using AJAX Solr
Parsing Solr XML with the help of XSLT
Case study โ€“ Apache Solr and Drupal
How to do it?
Summary
6. Distributed Search Using Apache Solr
Need for distributed search
Distributed search architecture
Apache Solr and distributed search
Understanding SolrCloud
Why Zookeeper?
SolrCloud architecture
Building enterprise distributed search using SolrCloud
Setting up a SolrCloud for development
Setting up a SolrCloud for production
Adding a document to SolrCloud
Creating shards, collections, and replicas in SolrCloud
Common problems and resolutions
Case study โ€“ distributed enterprise search server for the software industry
Summary
7. Scaling Solr through Sharding, Fault Tolerance, and Integration
Enabling search result clustering with Carrot2
Why Carrot2?
Enabling Carrot2-based document clustering
Understanding Carrot2 result clustering
Viewing Solr results in the Carrot2 workbench
FAQs and problems
Sharding and fault tolerance
Document routing and sharding
Shard splitting
Load balancing and fault tolerance in SolrCloud
Searching Solr documents in near real time
Strategies for near real-time search in Apache Solr
Explicit call to commit from a client
solrconfig.xml โ€“ autocommit
CommitWithin โ€“ delegating the responsibility to Solr
Real-time search in Apache Solr
Solr with MongoDB
Understanding MongoDB
Installing MongoDB
Creating Solr indexes from MongoDB
Scaling Solr through Storm
Getting along with Apache Storm
Solr and Apache Storm
Summary
8. Scaling Solr through High Performance
Monitoring performance of Apache Solr
What should be monitored?
Hardware and operating system
Java virtual machine
Apache Solr search runtime
Apache Solr indexing time
SolrCloud
Tools for monitoring Solr performance
Solr administration user interface
JConsole
SolrMeter
Tuning Solr JVM and container
Deciding heap size
How can we optimize JVM?
Optimizing JVM container
Optimizing Solr schema and indexing
Stored fields
Indexed fields and field lengths
Copy fields and dynamic fields
Fields for ra...

Table of contents

  1. Scaling Apache Solr