Deep Learning with Hadoop
Deep Learning with Hadoop
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2017
Production reference: 1130217
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78712-476-9
www.packtpub.com
Dipayan Dev has completed his M.Tech from National Institute of Technology, Silchar with a first class first and is currently working as a software professional in Bengaluru, India. He has extensive knowledge and experience in non-relational database technologies, having primarily worked with large-scale data over the last few years. His core expertise lies in Hadoop Framework. During his postgraduation, Dipayan had built an infinite scalable framework for Hadoop, called Dr. Hadoop, which got published in top-tier SCI-E indexed journal of Springer (http://link.springer.com/article/10.1631/FITEE.1500015). Dr. Hadoop has recently been cited by Goo Wikipedia in their Apache Hadoop article. Apart from that, he registers interest in a wide range of distributed system technologies, such as Redis, Apache Spark, Elasticsearch, Hive, Pig, Riak, and other NoSQL databases. Dipayan has also authored various research papers and book chapters, which are published by IEEE and top-tier Springer Journals. To know more about him, you can also visit his LinkedIn profile https://www.linkedin.com/in/dipayandev.
Shashwat Shriparv has more than 7 years of IT experience. He has worked with various technologies on his career path, such as Hadoop and subprojects, Java, .NET, and so on. He has experience in technologies such as Hadoop, HBase, Hive, Pig, Flume, Sqoop, Mongo, Cassandra, Java, C#, Linux, Scripting, PHP, C++, C, Web technologies, and various real-life use cases in BigData technologies as a developer and administrator. He likes to ride bikes, has interest in photography, and writes blogs when not working.
He has worked with companies such as CDAC, Genilok, HCL, UIDAI(Aadhaar), Pointcross; he is currently working with CenturyLink Cognilytics.
He is the author of Learning HBase, Packt Publishing, the reviewer of Pig Design Pattern book, Packt Publishing, and the reviewer of Hadoop Real-World Solution cookbook, 2nd edition.
I would like to take this opportunity to thank everyone who have somehow made my life better and appreciated me at my best and bared with me and supported me during my bad times.
Wissem El Khlifi is the first Oracle ACE in Spain and an Oracle Certified Professional DBA with over 12 years of IT experience. He earned the Computer Science Engineer degree from FST Tunisia, Masters in Computer Science from the UPC Barcelona, and Masters in Big Data Science from the UPC Barcelona. His area of interest include Cloud Architecture, Big Data Architecture, and Big Data Management & Analysis.
His career has included the roles of: Java analyst / programmer, Oracle Senior DBA, and big data scientist. He currently works as Senior Big Data and Cloud Architect for Schneider Electric / APC. He writes numerous articles on his website http://www.oracle-class.com and his twitter handle is @orawiss
.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at
www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
[email protected]
for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
- Fully searchable across every book published by Packt
- Copy and paste, print, and bookmark content
- On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762.
If you'd like to join our team of regular reviewers, you can e-mail us at
[email protected]
. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
To my mother, Dipti Deb and father, Tarun Kumar Deb.
And also my elder brother, Tapojit Deb.
This book will teach you how to deploy large-scale datasets in deep neural networks with Hadoop for optimal performance.
Starting with understanding what deep learning is, and what the various models associated with deep neural networks are, this book will then show you how to set up the Hadoop environment for deep learning.
Chapter 1, Introduction to Deep Learning, covers how deep learning has gained its popularity over the last decade and is now growing even faster than machine learning due to its enhanced functionalities. This chapter starts with an introduction of the real-life applications of Artificial Intelligence, the associated challenges, and how effectively Deep learning is able to address all of these. The chapter provides an in-depth explanation of deep learning by addressing some of the major machine learning problems such as, The curse of dimensionality, Vanishing gradient problem, and the likes. To get started with deep learning for the subsequent chapters, the classification of various deep learning networks is discussed in the latter part of this chapter. This chapter is primarily suitable for readers, who are interested to know the basics of deep learning without getting much into the details of individual deep neural networks.
Chapter 2, Distributed Deep Learning for Large - Scale Data, explains that big data and deep learning are undoubtedly the two hottest technical trends in recent days. Both of them are critically interconnected and have shown tremendous growth in the past few years. This chapter starts with how deep learning technologies can be furnished with massive amount of unstructured data to facilitate extraction of valuable hidden information out of them. Famous technological companies such as Google, Facebook, Apple, and the like are using this large-scale data in their deep learning projects to train some aggressively deep neural networks in a smarter way. Deep neural networks, however, show certain challenges while dealing with Big data. This chapter provides a detailed explanation of all these challenges. The latter part of the chapter introduces Hadoop, to discuss how deep learning models can be implemented using Hadoop's YARN and its iterative Map-reduce paradigm. The chapter further introduces Deeplearning4j, a popular open source distributed framework for deep learning and explains its various components.
Chapt...