Big Data Architect's Handbook
eBook - ePub

Big Data Architect's Handbook

A guide to building proficiency in tools and systems used by leading big data experts

Syed Muhammad Fahad Akhtar

Buch teilen
  1. 486 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Big Data Architect's Handbook

A guide to building proficiency in tools and systems used by leading big data experts

Syed Muhammad Fahad Akhtar

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence

Key Features

  • Learn to build and run a big data application with sample code
  • Explore examples to implement activities that a big data architect performs
  • Use Machine Learning and AI for structured and unstructured data

Book Description

The big data architects are the "masters" of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.

Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.

By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.

What you will learn

  • Learn Hadoop Ecosystem and Apache projects
  • Understand, compare NoSQL database and essential software architecture
  • Cloud infrastructure design considerations for big data
  • Explore application scenario of big data tools for daily activities
  • Learn to analyze and visualize results to uncover valuable insights
  • Build and run a big data application with sample code from end to end
  • Apply Machine Learning and AI to perform big data intelligence
  • Practice the daily activities performed by big data architects

Who this book is for

Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Big Data Architect's Handbook als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Big Data Architect's Handbook von Syed Muhammad Fahad Akhtar im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Data Processing. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2018
ISBN
9781788836388

NoSQL Database

Nowadays, there is so much hype about NoSQL databases, especially in the big data world. People seem to be discussing different aspects of NoSQL and how they can get the most out of it. Different types of questions come to their minds, such as what is it? How it is different from RDBMS? How do I select an appropriate framework and tool while architecting my project?
In this chapter, we will go through NoSQL and answer all of these questions to build a strong foundation. We will then cover the following NoSQL databases from practical aspects, which includes their installation, basic configuration, and most of the operations that we normally perform in a database. We will be mainly discussing the following topics:
  • What is NoSQL?
  • Benefits of NoSQL
  • Comparison of NoSQL and RDBMS
  • CAP theorem and ACID properties
  • Different data models in NoSQL
  • Apache Cassandra
  • MongoDB
  • Neo4j
Let's start exploring the NoSQL world with a question: what is NoSQL?

What is NoSQL?

So far, we have the understanding that when we say the word database, the most common definition that comes to our mind is very well structured and formatted data stored in a tabular form. Now the question is, what will happen with a large amount of data that is unstructured and doesn't have the proper formatting or schema? What will we do then? Here is where NoSQL database comes into the picture. It is a mechanism for storing data that doesn't have any fixed schema. Most of the people assume that it means No SQL, whereas the actual abbreviation is from Not Only SQL. It means that it doesn't rely only on the SQL programming language for manipulating and storing data, but it can be used in conjunction with other programming languages. Now, we will discuss some of the benefits of NoSQL databases.

Benefits of NoSQL databases

NoSQL helps us to deal with data that we were not able to store or maintain using traditional system approaches. The following are the key benefits of NoSQL databases:
  • NoSQL provides schema less data storage is one of the main advantages. It will allow the storing of all types of data in different formats and in different schemas, thereby providing more robust and agile development.
  • NoSQL servers scales horizontally, which means it is very easy to scale the capacity up or down. Simply add new servers or remove servers to increase or decrease its capacity, storage, and computation power.
  • NoSQL works in a clustered environment that is mainly built on commodity hardware, which is much less expensive than a highly reliable server without affecting the performance or reliability.
  • NoSQL databases spread across multiple nodes with replication to multiple servers. Some of NoSQL database frameworks even work without the master slave concept, which makes them highly available with no single point of failure.
These are some of the key advantages of using NoSQL databases over the traditional approach. Now, moving forward, we will compare NoSQL with RDBMS databases to give you a clear understanding of the differences between both database types.

NoSQL versus RDBMS

We will now discuss the different characteristics of NoSQL and relational databases with a point by point comparison to give you a clear understanding:
RDBMS
NoSQL
RDBMS are relational databases.
NoSQL are normally non-relational databases or distributed databases.
RDBMS databases store data in tabular form, which mean it contains data rows and columns.
NoSQL databases are of key-value, document, column based, or graph based datastores.
RDBMS has predefined schema.
NoSQL have dynamic schema.
RDBMS is vertically scalable. It means you can only increase the hardware on the same server, increasing the computation or other hardware resources.
NoSQL database is horizontally scalable. It means you can add more servers in the cluster to increase the computation power and different hardware resources, such as storage and memory.
RDBMS relies on standard query language for all the related operations. Different RDBMS tools extend the features of the SQL by introducing their own advanced function to be utilized for handling databases.
NoSQL uses the basic format of the standard query language. It may differ from framework to framework, whichever you select to handle NoSQL database.
SQL supports complex and nested queries to extract the desired output.
NoSQL frameworks normally handle the basic CRUD operations as far as support to SQL is concerned. It doesn't have the interface to handle complex and nested queries.
SQL is more reliable in performing high transactional values.
NoSQL doesn't support high transactional operations.
SQL replies on ACID (atomicity, consistency, isolation, durability) properties. We will discuss ACID properties in detail as we proceed in this chapter.
NoSQL mainly follows the CAP (consistency, availability, partition) theorem. We will discuss the CAP theorem in detail as we proceed in this chapter.
SQL databases are classified based on open source or commercial application.
NoSQL classifies its different database frameworks based on the data model type it supports, such as key-value data stores, and column stores, and so on.
Example of RDBMS are MySQL, Microsoft SQL, Oracle, Postgres.
Example of NoSQL frameworks are Apache Cassandra, MongoDB, HBase, Redis, Neo4j.
SQL stands for Standard Query Language. It is a programming language used in RDBMS (Relational Database Management System) to store and retrieve data from a structured databases. It is as per the ANSI standard.
Now we will discuss the CAP theorem, which is related to NoSQL database, and ACID properties, which are related to SQL database, in order to understand some of the differences between NoSQL and Relational databases, as mentioned earlier.

The C...

Inhaltsverzeichnis