Seven NoSQL Databases in a Week
eBook - ePub

Seven NoSQL Databases in a Week

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Compartir libro
  1. 308 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Seven NoSQL Databases in a Week

Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

A beginner's guide to get you up and running with Cassandra, DynamoDB, HBase, InfluxDB, MongoDB, Neo4j, and Redis

Key Features

  • Covers the basics of 7 NoSQL databases and how they are used in the enterprises
  • Quick introduction to MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase
  • Includes effective techniques for database querying and management

Book Description

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers.

This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, InfluxDB, and Neo4j. The book doesn't go into too much detail about each database but teaches

you enough to get started with them.

By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the right

database according to your needs.

What you will learn

  • Understand how MongoDB provides high-performance, high-availability, and automatic scaling
  • Interact with your Neo4j instances via database queries, Python scripts, and Java application code
  • Get familiar with common querying and programming methods to interact with Redis
  • Study the different types of problems Cassandra can solve
  • Work with HBase components to support common operations such as creating tables and reading/writing data
  • Discover data models and work with CRUD operations using DynamoDB
  • Discover what makes InfluxDB a great choice for working with
  • time-series data

Who this book is for

If you are a budding DBA or a developer who wants to get started with the fundamentals of NoSQL databases, this book is for you. Relational DBAs who want to get insights into the various offerings of popular NoSQL databases will also find this book to be very useful.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Seven NoSQL Databases in a Week un PDF/ePUB en línea?
Sí, puedes acceder a Seven NoSQL Databases in a Week de Xun (Brian) Wu, Sudarshan Kadambi, Devram Kandhare, Aaron Ploetz en formato PDF o ePUB, así como a otros libros populares de Computer Science y Databases. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781787127142
Edición
1
Categoría
Databases

InfluxDB

The term big data is everywhere these days, has now entered the mainstream, and is also merging with traditional analytics. More electronic devices than ever before are connected to the internet, phones, watches, sensors, cars, TVs, and so on. These devices generate enormous amounts of new, unstructured real-time data every minute. Analyzing time-structured data has become the most important problem across many industries. Many companies are looking for a new way to solve their time-series data problems and have utilized their available influx data. As a result, the popularity of the time-series database has rapidly increased over the past few years. InfluxDB is one of the most popular time-series databases in this area.
In this chapter, we will cover the following topics:
  • What is InfluxDB?
  • Installation and configuration
  • Query language and API
  • InfluxDB ecosystem
  • InfluxDB operations

Introduction to InfluxDB

InfluxDB is developed by InfluxData. It is an open source, big data, NoSQL database that allows for massive scalability, high availability, fast write, and fast read. As a NoSQL, InfluxDB stores time-series data, which has a series of data points over time. These data points can be regular or irregular type based on the type of data resource. Some regular data measurements are based on a fixed interval time, for example, system heartbeat monitoring data. Other data measurements could be based on a discrete event, for example, trading transaction data, sensor data, and so on.
InfluxDB is written on the go; this makes it easy to compile and deploy without external dependencies. It offers an SQL-like query language. The plug-in architecture design makes it very flexible to integrate other third-party products.
Like other NoSQL databases, it supports different clients such as Go, Java, Python, and Node.js to interact with the database. The convenience HTTP native API can easily integrate with web-based products such as DevOps to monitor real-time data.
Since it's specially designed for time-series data, it became more and more popular in this kind of data use case, such as DevOps monitoring, Internet of Things (IoT) monitoring, and time-series based analytics application.
The classic use case of time-series data includes the following:
  • System and monitoring logs
  • Financial/stock tickers over time in financial markets
  • Tracking product inventory in the retail system
  • Sensors data generation in IoT and Industrial Internet of Things (IIoT)
  • Geo positioning and tracking in the transportation industry
The data for each of these use cases is different, but they frequently have a similar pattern.
In the system and monitoring logs case, we're taking regular measurements for tracking different production services such as Apache, Tomcat, MySQL, Hadoop, Kafka, Spark, Hive, Web applications etc. Series usually have metadata information such as the server name, the service name, and the metric being measured.
Let's assume a common case to have 200 or more measurements (unique series) per server. Say we have 300 servers, VMs, and containers. Our task is to sample them once every 10 seconds. This will give us a total of 24 * 60 * 60 / 10 = 8,640 values per series. For each day, a total distinct point is 8,640 * 300 * 200 = 518,400,000 (around 0.5 billion data points per day).
In a relational database, there are few ways to structure things, but there are some challenges, which are listed as follows:
  • Create a single denormalized table to store all of the data with the series name, the value, and a time. In this approach, the table will get 0.5 billion per day. This would quickly cause a problem because of the size of the table.
  • Create a separate table per period of time (day, month, and so on). It required the developer to write code archives and versioning historical data from the different tables together.
After comparing with relational databases, let's look at some big data databases such as Cassandra and Hive.
As with the SQL variant, building a time-series solution on top of Cassandra requires quite a bit of application-level code.
First, you need to design a data mode for structuring the data. Cassandra rows are stored as one replication group, you need to design proper row keys to ensure that the cluster is properly utilized for querying a data load. Then, you need to write the ETL code to process the raw data, build row keys, and other application logic to write the time-series data into the table.
This is the same case for Hive, where you need to properly design the partition key based on the time-series use case, then pull or receive data from the source system by running Kafka, Spark, Flink, Storm, or other big data processing frameworks. You will end up writing some ETL aggregation logic to handle lower precision samples that can be used for longer-term visualizations.
Finally, you need to package all of this code and deploy it to production and follow the DevOps process. You also need to ensure that the query performances are optimized for all of these use cases.
The whole process will typically require the developer team to spend several months to completely coordinate with many other teams.
InfluxDB has a number of features that can take care of all of the features mentioned earlier, automatically.

Key concepts and terms of InfluxDB

InfluxDB uses particular terms to describe the various components of time-series data, and the techniques used to categorize this data to make InfluxDB unique.
InfluxDB organizes data by database, time series, and point of events. The database is quite similar to ...

Índice