Big Data Architect's Handbook
A guide to building proficiency in tools and systems used by leading big data experts
Syed Muhammad Fahad Akhtar
- 486 pagine
- English
- ePUB (disponibile sull'app)
- Disponibile su iOS e Android
Big Data Architect's Handbook
A guide to building proficiency in tools and systems used by leading big data experts
Syed Muhammad Fahad Akhtar
Informazioni sul libro
A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence
Key Features
- Learn to build and run a big data application with sample code
- Explore examples to implement activities that a big data architect performs
- Use Machine Learning and AI for structured and unstructured data
Book Description
The big data architects are the "masters" of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.
Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.
By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.
What you will learn
- Learn Hadoop Ecosystem and Apache projects
- Understand, compare NoSQL database and essential software architecture
- Cloud infrastructure design considerations for big data
- Explore application scenario of big data tools for daily activities
- Learn to analyze and visualize results to uncover valuable insights
- Build and run a big data application with sample code from end to end
- Apply Machine Learning and AI to perform big data intelligence
- Practice the daily activities performed by big data architects
Who this book is for
Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.
Domande frequenti
Informazioni
NoSQL Database
- What is NoSQL?
- Benefits of NoSQL
- Comparison of NoSQL and RDBMS
- CAP theorem and ACID properties
- Different data models in NoSQL
- Apache Cassandra
- MongoDB
- Neo4j
What is NoSQL?
Benefits of NoSQL databases
- NoSQL provides schema less data storage is one of the main advantages. It will allow the storing of all types of data in different formats and in different schemas, thereby providing more robust and agile development.
- NoSQL servers scales horizontally, which means it is very easy to scale the capacity up or down. Simply add new servers or remove servers to increase or decrease its capacity, storage, and computation power.
- NoSQL works in a clustered environment that is mainly built on commodity hardware, which is much less expensive than a highly reliable server without affecting the performance or reliability.
- NoSQL databases spread across multiple nodes with replication to multiple servers. Some of NoSQL database frameworks even work without the master slave concept, which makes them highly available with no single point of failure.
NoSQL versus RDBMS
RDBMS | NoSQL |
RDBMS are relational databases. | NoSQL are normally non-relational databases or distributed databases. |
RDBMS databases store data in tabular form, which mean it contains data rows and columns. | NoSQL databases are of key-value, document, column based, or graph based datastores. |
RDBMS has predefined schema. | NoSQL have dynamic schema. |
RDBMS is vertically scalable. It means you can only increase the hardware on the same server, increasing the computation or other hardware resources. | NoSQL database is horizontally scalable. It means you can add more servers in the cluster to increase the computation power and different hardware resources, such as storage and memory. |
RDBMS relies on standard query language for all the related operations. Different RDBMS tools extend the features of the SQL by introducing their own advanced function to be utilized for handling databases. | NoSQL uses the basic format of the standard query language. It may differ from framework to framework, whichever you select to handle NoSQL database. |
SQL supports complex and nested queries to extract the desired output. | NoSQL frameworks normally handle the basic CRUD operations as far as support to SQL is concerned. It doesn't have the interface to handle complex and nested queries. |
SQL is more reliable in performing high transactional values. | NoSQL doesn't support high transactional operations. |
SQL replies on ACID (atomicity, consistency, isolation, durability) properties. We will discuss ACID properties in detail as we proceed in this chapter. | NoSQL mainly follows the CAP (consistency, availability, partition) theorem. We will discuss the CAP theorem in detail as we proceed in this chapter. |
SQL databases are classified based on open source or commercial application. | NoSQL classifies its different database frameworks based on the data model type it supports, such as key-value data stores, and column stores, and so on. |
Example of RDBMS are MySQL, Microsoft SQL, Oracle, Postgres. | Example of NoSQL frameworks are Apache Cassandra, MongoDB, HBase, Redis, Neo4j. |