eBook - ePub

Seven NoSQL Databases in a Week

Name: Seven NoSQL Databases in a Week
Author: Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Condividi libro

308 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Seven NoSQL Databases in a Week

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

A beginner's guide to get you up and running with Cassandra, DynamoDB, HBase, InfluxDB, MongoDB, Neo4j, and Redis

Key Features

Covers the basics of 7 NoSQL databases and how they are used in the enterprises
Quick introduction to MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase
Includes effective techniques for database querying and management

Book Description

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers.

This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, InfluxDB, and Neo4j. The book doesn't go into too much detail about each database but teaches

you enough to get started with them.

By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the right

database according to your needs.

What you will learn

Understand how MongoDB provides high-performance, high-availability, and automatic scaling
Interact with your Neo4j instances via database queries, Python scripts, and Java application code
Get familiar with common querying and programming methods to interact with Redis
Study the different types of problems Cassandra can solve
Work with HBase components to support common operations such as creating tables and reading/writing data
Discover data models and work with CRUD operations using DynamoDB
Discover what makes InfluxDB a great choice for working with
time-series data

Who this book is for

If you are a budding DBA or a developer who wants to get started with the fundamentals of NoSQL databases, this book is for you. Relational DBAs who want to get insights into the various offerings of popular NoSQL databases will also find this book to be very useful.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Seven NoSQL Databases in a Week è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Seven NoSQL Databases in a Week di Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Databases. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2018

ISBN

9781787127142

Edizione

Argomento

Computer Science

Categoria

Databases

InfluxDB

The term big data is everywhere these days, has now entered the mainstream, and is also merging with traditional analytics. More electronic devices than ever before are connected to the internet, phones, watches, sensors, cars, TVs, and so on. These devices generate enormous amounts of new, unstructured real-time data every minute. Analyzing time-structured data has become the most important problem across many industries. Many companies are looking for a new way to solve their time-series data problems and have utilized their available influx data. As a result, the popularity of the time-series database has rapidly increased over the past few years. InfluxDB is one of the most popular time-series databases in this area.

In this chapter, we will cover the following topics:

What is InfluxDB?
Installation and configuration
Query language and API
InfluxDB ecosystem
InfluxDB operations

Introduction to InfluxDB

InfluxDB is developed by InfluxData. It is an open source, big data, NoSQL database that allows for massive scalability, high availability, fast write, and fast read. As a NoSQL, InfluxDB stores time-series data, which has a series of data points over time. These data points can be regular or irregular type based on the type of data resource. Some regular data measurements are based on a fixed interval time, for example, system heartbeat monitoring data. Other data measurements could be based on a discrete event, for example, trading transaction data, sensor data, and so on.

InfluxDB is written on the go; this makes it easy to compile and deploy without external dependencies. It offers an SQL-like query language. The plug-in architecture design makes it very flexible to integrate other third-party products.

Like other NoSQL databases, it supports different clients such as Go, Java, Python, and Node.js to interact with the database. The convenience HTTP native API can easily integrate with web-based products such as DevOps to monitor real-time data.

Since it's specially designed for time-series data, it became more and more popular in this kind of data use case, such as DevOps monitoring, Internet of Things (IoT) monitoring, and time-series based analytics application.

The classic use case of time-series data includes the following:

System and monitoring logs
Financial/stock tickers over time in financial markets
Tracking product inventory in the retail system
Sensors data generation in IoT and Industrial Internet of Things (IIoT)
Geo positioning and tracking in the transportation industry

The data for each of these use cases is different, but they frequently have a similar pattern.

In the system and monitoring logs case, we're taking regular measurements for tracking different production services such as Apache, Tomcat, MySQL, Hadoop, Kafka, Spark, Hive, Web applications etc. Series usually have metadata information such as the server name, the service name, and the metric being measured.

Let's assume a common case to have 200 or more measurements (unique series) per server. Say we have 300 servers, VMs, and containers. Our task is to sample them once every 10 seconds. This will give us a total of 24 * 60 * 60 / 10 = 8,640 values per series. For each day, a total distinct point is 8,640 * 300 * 200 = 518,400,000 (around 0.5 billion data points per day).

In a relational database, there are few ways to structure things, but there are some challenges, which are listed as follows:

Create a single denormalized table to store all of the data with the series name, the value, and a time. In this approach, the table will get 0.5 billion per day. This would quickly cause a problem because of the size of the table.
Create a separate table per period of time (day, month, and so on). It required the developer to write code archives and versioning historical data from the different tables together.

After comparing with relational databases, let's look at some big data databases such as Cassandra and Hive.

As with the SQL variant, building a time-series solution on top of Cassandra requires quite a bit of application-level code.

First, you need to design a data mode for structuring the data. Cassandra rows are stored as one replication group, you need to design proper row keys to ensure that the cluster is properly utilized for querying a data load. Then, you need to write the ETL code to process the raw data, build row keys, and other application logic to write the time-series data into the table.

This is the same case for Hive, where you need to properly design the partition key based on the time-series use case, then pull or receive data from the source system by running Kafka, Spark, Flink, Storm, or other big data processing frameworks. You will end up writing some ETL aggregation logic to handle lower precision samples that can be used for longer-term visualizations.

Finally, you need to package all of this code and deploy it to production and follow the DevOps process. You also need to ensure that the query performances are optimized for all of these use cases.

The whole process will typically require the developer team to spend several months to completely coordinate with many other teams.

InfluxDB has a number of features that can take care of all of the features mentioned earlier, automatically.

Key concepts and terms of InfluxDB

InfluxDB uses particular terms to describe the various components of time-series data, and the techniques used to categorize this data to make InfluxDB unique.

InfluxDB organizes data by database, time series, and point of events. The database is quite similar to ...