Big Data Architect's Handbook
eBook - ePub

Big Data Architect's Handbook

A guide to building proficiency in tools and systems used by leading big data experts

Syed Muhammad Fahad Akhtar

Share book
  1. 486 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data Architect's Handbook

A guide to building proficiency in tools and systems used by leading big data experts

Syed Muhammad Fahad Akhtar

Book details
Book preview
Table of contents
Citations

About This Book

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence

Key Features

  • Learn to build and run a big data application with sample code
  • Explore examples to implement activities that a big data architect performs
  • Use Machine Learning and AI for structured and unstructured data

Book Description

The big data architects are the "masters" of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.

Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.

By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.

What you will learn

  • Learn Hadoop Ecosystem and Apache projects
  • Understand, compare NoSQL database and essential software architecture
  • Cloud infrastructure design considerations for big data
  • Explore application scenario of big data tools for daily activities
  • Learn to analyze and visualize results to uncover valuable insights
  • Build and run a big data application with sample code from end to end
  • Apply Machine Learning and AI to perform big data intelligence
  • Practice the daily activities performed by big data architects

Who this book is for

Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Big Data Architect's Handbook an online PDF/ePUB?
Yes, you can access Big Data Architect's Handbook by Syed Muhammad Fahad Akhtar in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781788836388
Edition
1

NoSQL Database

Nowadays, there is so much hype about NoSQL databases, especially in the big data world. People seem to be discussing different aspects of NoSQL and how they can get the most out of it. Different types of questions come to their minds, such as what is it? How it is different from RDBMS? How do I select an appropriate framework and tool while architecting my project?
In this chapter, we will go through NoSQL and answer all of these questions to build a strong foundation. We will then cover the following NoSQL databases from practical aspects, which includes their installation, basic configuration, and most of the operations that we normally perform in a database. We will be mainly discussing the following topics:
  • What is NoSQL?
  • Benefits of NoSQL
  • Comparison of NoSQL and RDBMS
  • CAP theorem and ACID properties
  • Different data models in NoSQL
  • Apache Cassandra
  • MongoDB
  • Neo4j
Let's start exploring the NoSQL world with a question: what is NoSQL?

What is NoSQL?

So far, we have the understanding that when we say the word database, the most common definition that comes to our mind is very well structured and formatted data stored in a tabular form. Now the question is, what will happen with a large amount of data that is unstructured and doesn't have the proper formatting or schema? What will we do then? Here is where NoSQL database comes into the picture. It is a mechanism for storing data that doesn't have any fixed schema. Most of the people assume that it means No SQL, whereas the actual abbreviation is from Not Only SQL. It means that it doesn't rely only on the SQL programming language for manipulating and storing data, but it can be used in conjunction with other programming languages. Now, we will discuss some of the benefits of NoSQL databases.

Benefits of NoSQL databases

NoSQL helps us to deal with data that we were not able to store or maintain using traditional system approaches. The following are the key benefits of NoSQL databases:
  • NoSQL provides schema less data storage is one of the main advantages. It will allow the storing of all types of data in different formats and in different schemas, thereby providing more robust and agile development.
  • NoSQL servers scales horizontally, which means it is very easy to scale the capacity up or down. Simply add new servers or remove servers to increase or decrease its capacity, storage, and computation power.
  • NoSQL works in a clustered environment that is mainly built on commodity hardware, which is much less expensive than a highly reliable server without affecting the performance or reliability.
  • NoSQL databases spread across multiple nodes with replication to multiple servers. Some of NoSQL database frameworks even work without the master slave concept, which makes them highly available with no single point of failure.
These are some of the key advantages of using NoSQL databases over the traditional approach. Now, moving forward, we will compare NoSQL with RDBMS databases to give you a clear understanding of the differences between both database types.

NoSQL versus RDBMS

We will now discuss the different characteristics of NoSQL and relational databases with a point by point comparison to give you a clear understanding:
RDBMS
NoSQL
RDBMS are relational databases.
NoSQL are normally non-relational databases or distributed databases.
RDBMS databases store data in tabular form, which mean it contains data rows and columns.
NoSQL databases are of key-value, document, column based, or graph based datastores.
RDBMS has predefined schema.
NoSQL have dynamic schema.
RDBMS is vertically scalable. It means you can only increase the hardware on the same server, increasing the computation or other hardware resources.
NoSQL database is horizontally scalable. It means you can add more servers in the cluster to increase the computation power and different hardware resources, such as storage and memory.
RDBMS relies on standard query language for all the related operations. Different RDBMS tools extend the features of the SQL by introducing their own advanced function to be utilized for handling databases.
NoSQL uses the basic format of the standard query language. It may differ from framework to framework, whichever you select to handle NoSQL database.
SQL supports complex and nested queries to extract the desired output.
NoSQL frameworks normally handle the basic CRUD operations as far as support to SQL is concerned. It doesn't have the interface to handle complex and nested queries.
SQL is more reliable in performing high transactional values.
NoSQL doesn't support high transactional operations.
SQL replies on ACID (atomicity, consistency, isolation, durability) properties. We will discuss ACID properties in detail as we proceed in this chapter.
NoSQL mainly follows the CAP (consistency, availability, partition) theorem. We will discuss the CAP theorem in detail as we proceed in this chapter.
SQL databases are classified based on open source or commercial application.
NoSQL classifies its different database frameworks based on the data model type it supports, such as key-value data stores, and column stores, and so on.
Example of RDBMS are MySQL, Microsoft SQL, Oracle, Postgres.
Example of NoSQL frameworks are Apache Cassandra, MongoDB, HBase, Redis, Neo4j.
SQL stands for Standard Query Language. It is a programming language used in RDBMS (Relational Database Management System) to store and retrieve data from a structured databases. It is as per the ANSI standard.
Now we will discuss the CAP theorem, which is related to NoSQL database, and ACID properties, which are related to SQL database, in order to understand some of the differences between NoSQL and Relational databases, as mentioned earlier.

The C...

Table of contents