Learning Apache Spark 2
eBook - ePub

Learning Apache Spark 2

Muhammad Asif Abbasi

Compartir libro
  1. 356 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Learning Apache Spark 2

Muhammad Asif Abbasi

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analyticsAbout This Book• Exclusive guide that covers how to get up and running with fast data processing using Apache Spark• Explore and exploit various possibilities with Apache Spark using real-world use cases in this book• Want to perform efficient data processing at real time? This book will be your one-stop solution.Who This Book Is ForThis guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey.What You Will Learn• Get an overview of big data analytics and its importance for organizations and data professionals• Delve into Spark to see how it is different from existing processing platforms• Understand the intricacies of various file formats, and how to process them with Apache Spark.• Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.• Learn the concepts of Spark SQL, SchemaRDD, Caching and working with Hive and Parquet file formats• Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.• Introduce yourself to the deployment and usage of SparkR.• Walk through the importance of Graph computation and the graph processing systems available in the market• Check the real world example of Spark by building a recommendation engine with Spark using ALS.• Use a Telco data set, to predict customer churn using Random Forests.In DetailSpark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.Once we understand the individual components, we will take a couple of real life advanced analytics examples such as 'Building a Recommendation system', 'Predicting customer churn' and so on.The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.Style and approachWith the help of practical examples and real-world use cases, this guide will take you from scratch to building efficient data applications using Apache Spark.You will learn all about this excellent data processing engine in a step-by-step manner, taking one aspect of it at a time.This highly practical guide will include how to work with data pipelines, dataframes, clustering, SparkSQL, parallel programming, and such insightful topics with the help of real-world use cases.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Learning Apache Spark 2 un PDF/ePUB en línea?
Sí, puedes acceder a Learning Apache Spark 2 de Muhammad Asif Abbasi en formato PDF o ePUB, así como a otros libros populares de Informatique y Traitement des données. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2017
ISBN
9781785889585
Edición
1
Categoría
Informatique

Learning Apache Spark 2


Learning Apache Spark 2

Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: March 2017
Production reference: 1240317
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78588-513-6
www.packtpub.com

Credits

Authors
Muhammad Asif Abbasi
Copy Editor
Safis Editing
Reviewers
Prashant Verma
Project Coordinator
Nidhi Joshi
Commissioning Editor
Veena Pagare
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Tejal Daruwale Soni
Content Development Editor
Mayur Pawanikar
Graphics
Tania Dutta
Technical Editor
Karan Thakkar
Production Coordinator
Nilesh Mohite

About the Author

Muhammad Asif Abbasi has worked in the industry for over 15 years in a variety of roles from engineering solutions to selling solutions and everything in between. Asif is currently working with SAS a market leader in Analytic Solutions as a Principal Business Solutions Manager for the Global Technologies Practice. Based in London, Asif has vast experience in consulting for major organizations and industries across the globe, and running proof-of-concepts across various industries including but not limited to telecommunications, manufacturing, retail, finance, services, utilities and government. Asif is an Oracle Certified Java EE 5 Enterprise architect, Teradata Certified Master, PMP, Hortonworks Hadoop Certified developer, and administrator. Asif also holds a Master's degree in Computer Science and Business Administration.

About the Reviewers

Prashant Verma started his IT carrier in 2011 as a Java developer in Ericsson working in telecom domain. After couple of years of JAVA EE experience, he moved into Big Data domain, and has worked on almost all the popular big data technologies, such as Hadoop, Spark, Flume, Mongo, Cassandra,etc. He has also played with Scala. Currently, He works with QA Infotech as Lead Data Enginner, working on solving e-Learning problems using analytics and machine learning.
Prashant has also worked on Apache Spark for Java Developers, Packt as a Technical Reviewer.
I want to thank Packt Publishing for giving me the chance to review the book as well as my employer and my family for their patience while I was busy working on this book.

www.packtpub.com

For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
www.packtpub.com
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

  • Fully searchable across every book published by Packt
  • Copy and paste, print, and bookmark content
  • On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review at the website where you acquired this product.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

This book will cover the technical aspects of Apache Spark 2.0, one of the fastest growing open-source projects. In order to understand what Apache Spark is, we will quickly recap a the history of Big Data, and what has made Apache Spark popular. Irrespective of your expertise level, we suggest going through this introduction as it will help set the context of the book.

The Past

Before going into the present-day Spark, it might be worthwhile understanding what problems Spark intend to solve, and especially the data movement. Without knowing the background we will not be able to predict the future.
"You have to learn the past to predict the future."
Late 1990s: The world was a much simpler place to live, with proprietary databases being the sole choice of consumers. Data was growing at quite an amazing pace, and some of the biggest databases boasted of maintaining datasets in excess of a Terabyte.
Early 2000s: The dotcom bubble happened, meant companies...

Índice