Scaling Big Data with Hadoop and Solr - Second Edition
eBook - ePub

Scaling Big Data with Hadoop and Solr - Second Edition

Hrishikesh Vijay Karambelkar

Partager le livre
  1. 166 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Scaling Big Data with Hadoop and Solr - Second Edition

Hrishikesh Vijay Karambelkar

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

About This Book

  • Explore different approaches to making Solr work on big data ecosystems besides Apache Hadoop
  • Improve search performance while working with big data
  • A practical guide that covers interesting, real-life use cases for big data search along with sample code

Who This Book Is For

This book is aimed at developers, designers, and architects who would like to build big data enterprise search solutions for their customers or organizations. No prior knowledge of Apache Hadoop and Apache Solr/Lucene technologies is required.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Scaling Big Data with Hadoop and Solr - Second Edition est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Scaling Big Data with Hadoop and Solr - Second Edition par Hrishikesh Vijay Karambelkar en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Programming in Java. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2015
ISBN
9781783553402

Scaling Big Data with Hadoop and Solr Second Edition


Table of Contents

Scaling Big Data with Hadoop and Solr Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Processing Big Data Using Hadoop and MapReduce
Apache Hadoop's ecosystem
Core components
Understanding Hadoop's ecosystem
Configuring Apache Hadoop
Prerequisites
Setting up ssh without passphrase
Configuring Hadoop
Running Hadoop
Setting up a Hadoop cluster
Common problems and their solutions
Summary
2. Understanding Apache Solr
Setting up Apache Solr
Prerequisites for setting up Apache Solr
Running Apache Solr on jetty
Running Solr on other J2EE containers
Hello World with Apache Solr!
Understanding Solr administration
Solr navigation
Common problems and solutions
The Apache Solr architecture
Configuring Solr
Understanding the Solr structure
Defining the Solr schema
Solr fields
Dynamic fields in Solr
Copying the fields
Dealing with field types
Additional metadata configuration
Other important elements of the Solr schema
Configuration files of Apache Solr
Working with solr.xml and Solr core
Instance configuration with solrconfig.xml
Understanding the Solr plugin
Other configuration
Loading data in Apache Solr
Extracting request handler – Solr Cell
Understanding data import handlers
Interacting with Solr through SolrJ
Working with rich documents (Apache Tika)
Querying for information in Solr
Summary
3. Enabling Distributed Search using Apache Solr
Understanding a distributed search
Distributed search patterns
Apache Solr and distributed search
Working with SolrCloud
Why ZooKeeper?
The SolrCloud architecture
Building an enterprise distributed search using SolrCloud
Setting up SolrCloud for development
Setting up SolrCloud for production
Adding a document to SolrCloud
Creating shards, collections, and replicas in SolrCloud
Common problems and resolutions
Sharding algorithm and fault tolerance
Document Routing and Sharding
Shard splitting
Load balancing and fault tolerance in SolrCloud
Apache Solr and Big Data – integration with MongoDB
What is NoSQL and how is it related to Big Data?
MongoDB at glance
Installing MongoDB
Creating Solr indexes from MongoDB
Summary
4. Big Data Search Using Hadoop and Its Ecosystem
Understanding NoSQL
Working with the Solr HDFS connector
Big data search using Katta
How Katta works?
Setting up the Katta cluster
Creating Katta indexes
Using Solr 1045 Patch – map-side indexing
Using Solr 1301 Patch – reduce-side indexing
Distributed search using Apache Blur
Setting up Apache Blur with Hadoop
Apache Solr and Cassandra
Working with Cassandra and Solr
Single node configuration
Integrating with multinode Cassandra
Scaling Solr through Storm
Getting along with Apache Storm
Advanced analytics with Solr
Integrating Solr and R
Summary
5. Scaling Search Performance
Understanding the limits
Optimizing search schema
Specifying default search field
Configuring search schema fields
Stop words
Stemming
Index optimization
Limiting indexing buffer size
When to commit changes?
Optimizing index merge
Optimize option for index merging
Optimizing the container
Optimizing concurrent clients
Optimizing Java virtual memory
Optimizing search runtime
Optimizing through search query
Filter queries
Optimizing the Solr cache
The filter cache
The query result cache
The document cache
The field value cache
The lazy field loading
Optimizing Hadoop
Monitoring Solr instance
Using SolrMeter
Summary
A. Use Cases for Big Data Search
E-Commerce websites
Log management for banking
The problem
How can it be tackled?
High-level design
Index

Scaling Big Data with Hadoop and Solr Second Edition

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2013
Second edition: April 2015
Production reference: 1230415
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78355-339-6
www.packtpub.com

Credits

Author
Hrishikesh Vijay Karambelkar
Reviewers
Ramzi Alqrainy
Walt Stoneburner
Ning Sun
Ruben Teijeiro
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Nikhil Chinnari
Reshma Raman
Content Development Editor
Susmita Sabat
Technical Editor
Aman Preet Singh
Copy Editors
Sonia Cheema
Tani Kothari
Project Coordinator
Milton Dsouza
Proofreader
Simran Bhogal
Safis Editing
Indexer
Mariammal Chettiyar
Production Coordinator
Arvindkumar Gupta
Co...

Table des matiĂšres