eBook - ePub

Seven NoSQL Databases in a Week

Name: Seven NoSQL Databases in a Week
Author: Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Share book

308 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Seven NoSQL Databases in a Week

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Book details

Book preview

Table of contents

Citations

About This Book

A beginner's guide to get you up and running with Cassandra, DynamoDB, HBase, InfluxDB, MongoDB, Neo4j, and Redis

Key Features

Covers the basics of 7 NoSQL databases and how they are used in the enterprises
Quick introduction to MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase
Includes effective techniques for database querying and management

Book Description

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers.

This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, InfluxDB, and Neo4j. The book doesn't go into too much detail about each database but teaches

you enough to get started with them.

By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the right

database according to your needs.

What you will learn

Understand how MongoDB provides high-performance, high-availability, and automatic scaling
Interact with your Neo4j instances via database queries, Python scripts, and Java application code
Get familiar with common querying and programming methods to interact with Redis
Study the different types of problems Cassandra can solve
Work with HBase components to support common operations such as creating tables and reading/writing data
Discover data models and work with CRUD operations using DynamoDB
Discover what makes InfluxDB a great choice for working with
time-series data

Who this book is for

If you are a budding DBA or a developer who wants to get started with the fundamentals of NoSQL databases, this book is for you. Relational DBAs who want to get insights into the various offerings of popular NoSQL databases will also find this book to be very useful.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Seven NoSQL Databases in a Week an online PDF/ePUB?

Yes, you can access Seven NoSQL Databases in a Week by Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781787127142

Edition

Topic

Computer Science

Subtopic

Databases

Index

Computer Science

InfluxDB

The term big data is everywhere these days, has now entered the mainstream, and is also merging with traditional analytics. More electronic devices than ever before are connected to the internet, phones, watches, sensors, cars, TVs, and so on. These devices generate enormous amounts of new, unstructured real-time data every minute. Analyzing time-structured data has become the most important problem across many industries. Many companies are looking for a new way to solve their time-series data problems and have utilized their available influx data. As a result, the popularity of the time-series database has rapidly increased over the past few years. InfluxDB is one of the most popular time-series databases in this area.

In this chapter, we will cover the following topics:

What is InfluxDB?
Installation and configuration
Query language and API
InfluxDB ecosystem
InfluxDB operations

Introduction to InfluxDB

InfluxDB is developed by InfluxData. It is an open source, big data, NoSQL database that allows for massive scalability, high availability, fast write, and fast read. As a NoSQL, InfluxDB stores time-series data, which has a series of data points over time. These data points can be regular or irregular type based on the type of data resource. Some regular data measurements are based on a fixed interval time, for example, system heartbeat monitoring data. Other data measurements could be based on a discrete event, for example, trading transaction data, sensor data, and so on.

InfluxDB is written on the go; this makes it easy to compile and deploy without external dependencies. It offers an SQL-like query language. The plug-in architecture design makes it very flexible to integrate other third-party products.

Like other NoSQL databases, it supports different clients such as Go, Java, Python, and Node.js to interact with the database. The convenience HTTP native API can easily integrate with web-based products such as DevOps to monitor real-time data.

Since it's specially designed for time-series data, it became more and more popular in this kind of data use case, such as DevOps monitoring, Internet of Things (IoT) monitoring, and time-series based analytics application.

The classic use case of time-series data includes the following:

System and monitoring logs
Financial/stock tickers over time in financial markets
Tracking product inventory in the retail system
Sensors data generation in IoT and Industrial Internet of Things (IIoT)
Geo positioning and tracking in the transportation industry

The data for each of these use cases is different, but they frequently have a similar pattern.

In the system and monitoring logs case, we're taking regular measurements for tracking different production services such as Apache, Tomcat, MySQL, Hadoop, Kafka, Spark, Hive, Web applications etc. Series usually have metadata information such as the server name, the service name, and the metric being measured.

Let's assume a common case to have 200 or more measurements (unique series) per server. Say we have 300 servers, VMs, and containers. Our task is to sample them once every 10 seconds. This will give us a total of 24 * 60 * 60 / 10 = 8,640 values per series. For each day, a total distinct point is 8,640 * 300 * 200 = 518,400,000 (around 0.5 billion data points per day).

In a relational database, there are few ways to structure things, but there are some challenges, which are listed as follows:

Create a single denormalized table to store all of the data with the series name, the value, and a time. In this approach, the table will get 0.5 billion per day. This would quickly cause a problem because of the size of the table.
Create a separate table per period of time (day, month, and so on). It required the developer to write code archives and versioning historical data from the different tables together.

After comparing with relational databases, let's look at some big data databases such as Cassandra and Hive.

As with the SQL variant, building a time-series solution on top of Cassandra requires quite a bit of application-level code.

First, you need to design a data mode for structuring the data. Cassandra rows are stored as one replication group, you need to design proper row keys to ensure that the cluster is properly utilized for querying a data load. Then, you need to write the ETL code to process the raw data, build row keys, and other application logic to write the time-series data into the table.

This is the same case for Hive, where you need to properly design the partition key based on the time-series use case, then pull or receive data from the source system by running Kafka, Spark, Flink, Storm, or other big data processing frameworks. You will end up writing some ETL aggregation logic to handle lower precision samples that can be used for longer-term visualizations.

Finally, you need to package all of this code and deploy it to production and follow the DevOps process. You also need to ensure that the query performances are optimized for all of these use cases.

The whole process will typically require the developer team to spend several months to completely coordinate with many other teams.

InfluxDB has a number of features that can take care of all of the features mentioned earlier, automatically.

Key concepts and terms of InfluxDB

InfluxDB uses particular terms to describe the various components of time-series data, and the techniques used to categorize this data to make InfluxDB unique.

InfluxDB organizes data by database, time series, and point of events. The database is quite similar to ...