eBook - ePub

Mastering Apache Cassandra 3.x

Name: Mastering Apache Cassandra 3.x
Author: Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

An expert guide to improving database scalability and availability without compromising performance, 3rd Edition

Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

348 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering Apache Cassandra 3.x

An expert guide to improving database scalability and availability without compromising performance, 3rd Edition

Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

Book details

Book preview

Table of contents

Citations

About This Book

Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra

Key Features

Write programs more efficiently using Cassandra's features with the help of examples
Configure Cassandra and fine-tune its parameters depending on your needs
Integrate Cassandra database with Apache Spark and build strong data analytics pipeline

Book Description

With ever-increasing rates of data creation, the demand for storing data fast and reliably becomes a need. Apache Cassandra is the perfect choice for building fault-tolerant and scalable databases. Mastering Apache Cassandra 3.x teaches you how to build and architect your clusters, configure and work with your nodes, and program in a high-throughput environment, helping you understand the power of Cassandra as per the new features.

Once you've covered a brief recap of the basics, you'll move on to deploying and monitoring a production setup and optimizing and integrating it with other software. You'll work with the advanced features of CQL and the new storage engine in order to understand how they function on the server-side. You'll explore the integration and interaction of Cassandra components, followed by discovering features such as token allocation algorithm, CQL3, vnodes, lightweight transactions, and data modelling in detail. Last but not least you will get to grips with Apache Spark.

By the end of this book, you'll be able to analyse big data, and build and manage high-performance databases for your application.

What you will learn

Write programs more efficiently using Cassandra's features more efficiently
Exploit the given infrastructure, improve performance, and tweak the Java Virtual Machine (JVM)
Use CQL3 in your application in order to simplify working with Cassandra
Configure Cassandra and fine-tune its parameters depending on your needs
Set up a cluster and learn how to scale it
Monitor a Cassandra cluster in different ways
Use Apache Spark and other big data processing tools

Who this book is for

Mastering Apache Cassandra 3.x is for you if you are a big data administrator, database administrator, architect, or developer who wants to build a high-performing, scalable, and fault-tolerant database. Prior knowledge of core concepts of databases is required.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Mastering Apache Cassandra 3.x an online PDF/ePUB?

Yes, you can access Mastering Apache Cassandra 3.x by Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj in PDF and/or ePUB format, as well as other popular books in Informatique & Bases de données. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781789132809

Edition

Topic

Informatique

Subtopic

Bases de données

Effective CQL

In this chapter, we will examine common approaches to data modeling and interacting with data stored in Apache Cassandra. This will involve us taking a close look at the Cassandra Query Language, otherwise known as CQL. Specifically, we will cover and discuss the following topics:

The evolution of CQL and the role it plays in the Apache Cassandra universe
How data is structured and modeled effectively for Apache Cassandra
How to build primary keys that facilitate high-performing data models at scale
How CQL differs from SQL
CQL syntax and how to solve different types of problems using it

Once you have completed this chapter, you should have an understanding of why data models need to be built in a certain way. You should also begin to understand known Cassandra anti-patterns and be able to spot certain types of bad queries. This should help you to build scalable, query-based tables and write successful CQL to interact with them.

In the parts of this chapter that cover data modeling, be sure to pay extra attention. The data model is the most important part of a successful, high-performing Apache Cassandra cluster. It is also extremely difficult to change your data model later on, so test early, often, and with a significant amount of data. You do not want to realize that you need to change your model after you have already stored millions of rows. No amount of performance-tuning on the cluster side can make up for a poorly-designed data model!

An overview of Cassandra data modeling

Understanding how Apache Cassandra organizes data under the hood is essential to knowing how to use it properly. When examining Cassandra's data organization, it is important to determine which version of Apache Cassandra you are working with. Apache Cassandra 3.0 represents a significant shift in the way data is both stored and accessed, which warrants a discussion on the evolution of CQL.

Before we get started, let's create a keyspace for this chapter's work:

CREATE KEYSPACE packt_ch3 WITH replication =
 {'class': 'NetworkTopologyStrategy', 'ClockworkAngels':'1'};

To preface this discussion, let's create an example table. Let's assume that we want to store data about a music playlist, including the band's name, albums, song titles, and some additional data about the songs. The CQL for creating that table could look like this:

CREATE TABLE playlist (
 band TEXT,
 album TEXT,
 song TEXT,
 running_time TEXT,
 year INT,
 PRIMARY KEY (band,album,song));

Now we'll add some data into that table:

INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Rush','Moving Pictures','Limelight','4:20',1981);
INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Rush','Moving Pictures','Tom Sawyer','4:34',1981);
INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Rush','Moving Pictures','Red Barchetta','6:10',1981);
INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Rush','2112','2112','20:34',1976);
INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Rush','Clockwork Angels','Seven Cities of Gold','6:32',2012);
INSERT INTO playlist (band,album,song,running_time,year)
 VALUES ('Coheed and Cambria','Burning Star IV','Welcome Home','6:15',2006);

Cassandra storage model for early versions up to 2.2

The original underlying storage for Apache Cassandra was based on its use of the Thrift interface layer. If we were to look at how the underlying data was stored in older (pre-3.0) versions of Cassandra, we would see something similar to the following:

Figure 3.1: Demonstration of how data was stored in the older storage engine of Apache Cassandra. Notice that the data is partitioned (co-located) by its row key, and then each column is ordered by the column keys.

As you can see in the preceding screenshot, data is simply stored by its row key (also known as the partitioning key). Within each partition, data is stored ordered by its column keys, and finally by its (non-key) column names. This structure was sometimes referred to as a map of a map. The innermost section of the map, where the column values were stored, was called a cell. Dealing with data like this proved to be problematic and required some understanding of the Thrift API to complete basic operations.

When CQL was introduced with Cassandra 1.2, it essentially abstracted the Thrift model in favor of a SQL-like interface, which was more familiar to the database development community. This abstraction brought about the concept known as the CQL row. While the storage layer still viewed from the simple perspective of partitions and column values, CQL introduced the row construct to Cassandra, if only at a logical level. This difference between the physical and logical models of the Apache Cassandra storage engine was prevalent in major versions: 1.2, 2.0, 2.1, and 2.2.

Cassandra storage model for versions 3.0 and beyond

On the other hand, the new storage engine changes in Apache Cassandra 3.0 offer several improvements. With version 3.0 and up, stored data is now organized like this:

Figure 3.2: Demonstration of how data is stored in the new storage engine used by Apache Cassandra 3.0 and up. While data is still partitioned in a similar manner, rows are now first-class citizens.

The preceding figure shows that, while data is still partitioned similarly to how it always was, there is a new structure present. The row is now part of the storage engine. This allows for the data model and the Cassandra language drivers to deal with the underlying data similar to how the storage engine does.

An important aspect not pictured in the preceding screenshot is the fact that each row and column value has its own timestamp.

In addition to rows becoming first-class citizens of the physical data model, another change to the storage engine brought about a drastic improvement. As Apache Cassandra's original data model comes from more of a key/value approach, every row is not required to have a value for every column in a table.

The original storage engine allowed for this by repeating the column names and clustering keys with each column value. One way around repeating the column data was to use the WITH COMPACT STORAGE directive at the time of table creation. However, this presented limitations around schema flexibility, in that columns could no longer be added or removed.

Do not use the WITH COMPACT STORAGE directive with Apache Cassandra version 3.0 or newer. It no longer provides any benefits, and exists so that legacy users have an upgrade path.

With Apache Cassandra 3.0, column name...

Citation styles for Mastering Apache Cassandra 3.x

APA 6 Citation

Ploetz, A., Malepati, T., & Neeraj, N. (2018). Mastering Apache Cassandra 3.x (3rd ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/835401/mastering-apache-cassandra-3x-an-expert-guide-to-improving-database-scalability-and-availability-without-compromising-performance-3rd-edition-pdf (Original work published 2018)

Chicago Citation

Ploetz, Aaron, Tejaswi Malepati, and Nishant Neeraj. (2018) 2018. Mastering Apache Cassandra 3.x. 3rd ed. Packt Publishing. https://www.perlego.com/book/835401/mastering-apache-cassandra-3x-an-expert-guide-to-improving-database-scalability-and-availability-without-compromising-performance-3rd-edition-pdf.

Harvard Citation

Ploetz, A., Malepati, T. and Neeraj, N. (2018) Mastering Apache Cassandra 3.x. 3rd edn. Packt Publishing. Available at: https://www.perlego.com/book/835401/mastering-apache-cassandra-3x-an-expert-guide-to-improving-database-scalability-and-availability-without-compromising-performance-3rd-edition-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Ploetz, Aaron, Tejaswi Malepati, and Nishant Neeraj. Mastering Apache Cassandra 3.x. 3rd ed. Packt Publishing, 2018. Web. 14 Oct. 2022.