Instant Parallel Processing with Gearman
Instant Parallel Processing with Gearman
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2013
Production Reference: 1230713
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-407-8
www.packtpub.com
Author
John Ewart
Reviewer
Josh Black
Acquisition Editor
Vinay Argekar
Commissioning Editor
Priyanka Shah
Technical Editor
Mausam Kothari
Project Coordinator
Michelle Quadros
Proofreader
Bernadette Watkins
Graphics
Ronak Dhruv
Production Coordinator
Prachali Bhiwandkar
Cover Work
Prachali Bhiwandkar
Cover Image
Prachali Bhiwandkar
John Ewart is a systems architect, software developer, and lecturer. He has designed and taught courses at a variety of institutions including the University of California, California State University and local community colleges covering a wide range of computer science topics including Java, data structures and algorithms, operating systems fundamentals, UNIX and Linux systems administration, and web application development. In addition to working and teaching, he maintains and contributes to a number of open source projects. He currently resides in Redmond, Washington with his wife, Mary, and their two children.
Josh Black has been working with computers professionally for 20 years. He has a broad range of experience and expertise including systems and network administration, mobile app development, and production web applications. Josh earned a BS in computer science with a minor in math from California State University, Chico, in 2005. He currently resides in Chico, California, with his wife Rachel, and their four children.
Support files, eBooks, discount offers and more
You might want to visit www.packtpub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at
www.packtpub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
<[email protected]>
for more details.
At www.packtpub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
- Fully searchable across every book published by Packt
- Copy and paste, print and bookmark content
- On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.packtpub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Chapter 1. Instant Parallel Processing with Gearman
Welcome to the Instant Parallel Processing with Gearman. This book has been written to show you all you need to know to get started using Gearman. You will learn the history of Gearman, how to run your own servers, write some scripts to process data, and learn some of the more advanced features of Gearman.
This book contains the following sections:
So, what is Gearman? finds out what Gearman is, why it exists, and what you can do with it.
Quick start – building your first components shows you how to set up your own server and interacts with it with some quick examples in Ruby. This section of the book will cover the core concepts of Gearman to get you on your way as quickly as possible to processing jobs using Gearman.
Top 5 features you need to know about helps you learn how to use Gearman beyond simple job submissions. By the end of this section, you will be able to use Gearman and MapReduce methodologies to process large amounts of data, build a pipeline of complex loosely coupled processes that work together to process data using different languages and libraries, offload long-running and complex data analysis and provide real-time feedback to a frontend application, and utilize job coalescing to distribute results to multiple clients while processing the data only one time.
People and places you should get to know tells us that Gearman, like many other open source projects, has a wealth of online resources available. This section will get you started with a number of links to these resources including code examples, libraries, server implementations, mailing lists, and more.
Gearman is a network-based job-queuing system that was initially developed by Danga Interactive in order to process large volumes of jobs. Its primary design goals were low-latency remote function execution, being able to run code remotely and in parallel, load balancing of job distribution, and supporting writing components in multiple languages.
Although originally written in Perl it is comprised of, at its core, a network protocol that is designed to allow the various components to communicate the lifecycle of a unit of work. Because of this design, there are both servers and client libraries written in multiple languages including Ruby, Perl, PHP, Python, C, C++, and Java.
What this translates into is the ability to design and develop the various components of your architecture in whatever language makes the most sense and have those components communicate easily with one another.
Gearman goes one step further than simply defining a message bus; it formalizes its architecture to focus on units of work. This means that everything in a system using Gearman operates in terms of submitting or working on jobs. To follow this paradigm, Gearman has three main actors: clients who request that work be completed by somebody, the managers (servers) that are responsible for accepting jobs from clients, and then handing those jobs out to workers that ultimately complete the tasks.