Learn and Explore ClickHouse, It's Robust Table Engines for Analytical Tasks, ClickHouse SQL, Integration with External Applications, and Managing the ClickHouse Server
Learn and Explore ClickHouse, It's Robust Table Engines for Analytical Tasks, ClickHouse SQL, Integration with External Applications, and Managing the ClickHouse Server
Create scalable, fault-tolerant, and reliable online analytical applications with a feature-rich DBMS designed for speed.
Key Features ? Hands-on approach towards learning ClickHouse from basic to advanced level. ? Numerous examples demonstrating how to use ClickHouse for analytical tasks. ? Straightforward explanations for complex concepts on ClickHouse and its vast features. ? Integration with a variety of technologies such as MySQL, PostgreSQL, Kafka, and Amazon S3.
Description This book provides a hands-on approach for data professionals to onboard ClickHouse and empowers the readers to perform real-time analytics using ClickHouse SQL.The readers will understand the fundamentals of database technologies and frequently used relational database concepts such as keys, database normalisation etc. The readers will learn to query the data using SQL (ClickHouse dialect), configure databases and tables in ClickHouse and use the various types of core table engines available in ClickHouse, including the MergeTree and Log family engines. The readers will be able to investigate and practically integrate ClickHouse with various external data sources and work with unique table engines shipped with ClickHouse. With help of the examples provided, readers will be able to gain experience in configuring the ClickHouse setup and perform administrative tasks in the ClickHouse Server.Throughout this journey, readers will reinforce their learning by using numerous working examples and the question and answer section at the end of each chapter. By the end of this book, readers will be able to apply their knowledge and utilize ClickHouse in real-world applications.
What you will learn ? Querying the tables in ClickHouse and performing analytical tasks using ClickHouse SQL. ? Integrating and running queries with popular RDBMS, including MySQL and PostgreSQL. ? Integrating with cloud storage and streaming platforms such as S3 and Kafka. ? Working with Core engines and special engines. ? Configure the ClickHouse setup and carry out administrative tasks.
Who this book is for This book is intended for data engineers, application developers, database administrators and software architects who want to learn ClickHouse.
Table of Contents 1. Introduction 2. The Relational Database Model and Database Design 3. Setting up the Environment 4. ClickHouse SQL 5. SQL Functions in ClickHouse 6. SQL Functions for Data Aggregation 7. Table Engines - MergeTree Family 8. Table Engines - Log Family 9. External Data Sources 10. Special Engines 11. Configuring the ClickHouse Setup â Part 1 12. Configuring the ClickHouse Setup â Part 2
Trusted by 375,005 students
Access to over 1.5 million titles for a fair monthly price.
In this fast-paced age of digital economy, the data has gained more importance ever since the dawn of this century. Data-driven organizations are growing at a rapid pace; and the importance of data cannot be stressed enough. Data helps the organizations to understand and solve problems, make informed decisions, and improvise their process. With the ever growing demand for various types of data, there has been multiple efforts to store the data in an efficient and an optimal way, which in turn has led to the development of different database technologies. Currently, there are more than 300 database management systems that are actively developed and maintained (source: db-engines.com).
In this book, we will focus on a relatively new database management system called ClickHouse, which is a column-oriented database management system used for the online analytical processing systems.
Structure
In this chapter, we will discuss the following topics:
What is a database?
Different types of database management systems
Online transaction processing versus online analytical processing systems
Row versus columnar database
Introduction to ClickHouse
Objectives
After reading this chapter, you will be able to:
Know what is a database
Understand the commonly used database types
OLAP versus OLTP and when to use the row and columnar databases
Brief history of ClickHouse and its success stories
Data and databases
Data - âInformation, especially facts or numbers, collected to be examined and considered and used to help decision-making, or information in an electronic form that can be stored and used by a computerâ
- Cambridge Dictionary
Simply put, the data is a collection of numbers (measurements or observations), words, or just description of things. There is a small difference between the data and the information. The information is derived from the smaller chunks of data, which has to be analyzed, put into a context in order to retrieve the meaningful information. Data is collected, organized, and stored electronically in the computer database, which is also used to manage the stored collections.
In the last decade, the ever growing demand for data has caused a rapid increase in the volume of data, which has to be stored. This has left the traditional data storage/processing applications behind and a new subfield called the big data has taken the center stage. With an exponential increase in the amount of data that is stored, the speed of processing has remained as a challenge, especially for online analytical applications. This, in turn, has led to a rapid development of a special category of systems called the Online Analytical Processing Systems (OLAP Systems). Before getting into them, we shall have an overview on different types of database systems.
Different types of database management systems
Although it is not a formal classification, the following are the different types of database management systems classified based on how the data is stored and retrieved. In spite of this being classified into different groups, these systems may also exhibit some commonalities.
Relational database
In relational databases, the data is organized into tables of rows and columns and the information in multiple tables can be connected together by a logical connection called relationships. The rows are also referred as records or tuples and the columns as attributes.
Each row in the table will have a unique key called the primary key, which is used to define the relationship among the tables. When a new row is added to the table, a new and unique primary key is added. The primary key in one table will become a foreign key in the other table, as shown in the following figure:
Figure 1.1: Tables in relational databases
In the preceding example, we have two tables. The first table has the customer ID (cust_id) as a primary key and the second one has the order ID (order_id) as a primary key. The customer ID field in the second table is an example of a foreign key. In order to find out the orders made by the customer named Steve (with customer ID 2), we can use the customer ID to extract the relevant records from both the tables.
Most of the relational DBMS uses the Structured Query Language (SQL) for maintaining and querying the database. Examples of RDBMS include Oracle, MySQL, PostgreSQL, MariaDB, Microsoft Access, Microsoft SQL Server, IBM DB2, and SQLite.
Advantages:
Simple
Easy to query using SQL
Accuracy â primary keys prevent data duplication
Reduces data redundancy and improves data integrity via normalization
Supports transactions
Supports ACID properties in transactions to ensure data validity
No-SQL database
No-SQL (sometimes called Not Only SQL) databases provide an alternate way of storing and retrieving a large amount of the unstructured data. More recently, some of the No-SQL databases added support for SQL-like query languages. The two major ways of storing data are:
Keyâvalue stores
The data is stored as keyâvalue pairs where the keys are usually unique (like the primary key) and values are blobs (can be of any data type). The responsibility of decoding the values correctly lies with the client accessing the database. The client can read, write, update, and delete the values. As the values are read based on keys, this method is fast and scalable for larger datasets.
Figure 1.2: Sample keyâvalue store
For example â Redis and Memcached.
Advantages:
Fast readâwrite operations
Supports unstructured data
Easy to scale
High availability
Resilient to failures
Document store
This is quite similar to the keyâvalue store; however, the difference is that the value is usually a structured or a semi-structured data and is stored in XML, JSON, or BSON format.
Figure 1.3: Sample document store
For example â MongoDB and CouchDB.
Advantages:
Flexible â the structure of the document need not be consistent
Prior knowledge on data schema is not required
Information can be added, changed, deleted, and updated easily like in a relational database
Easily scalable
High availability
Easy to recover from failures
Graph database
A graph database consists of a collection of nodes and edges. A node represents an object and an edge is the connection between the two objects. Each node has an associated unique identifier that expresses the keyâvalue pairs. Similarly, an edge is also defined by a unique identifier that contains information about a starting or an ending node and properties like direction, parentâchild relationships, actions, ownership, and so on.
Figure 1.4: Data relationship modeled in a graph database
This example is of a graph data model that can be stored in a graph database. The three different set of nodes are persons, the organizations they worked in, and the location of the organizations.
Advantages:
For intensive data relationship handling, graph databases improve the performance by several orders of magnitude.
Flexibility â instead of modeling a domain ahead of time, data can be added to the existing graph structure without endangering the current functionality.
For example â JanusGraph and Neo4J.
Timeâseries database
As the name suggests, timeâseries databases are designed to store data that change with time. The data can be of any kind, which is periodically collected over time. Usually, they are the metrics collected from some systems. Although the timeâseries data can be stored in traditional relational databases, the key difference is that the records are appended and updates and deletes are not done. The timeâseries databases are optimized for large amounts of data ingestion and aggregation of the recorded metrics.
Advantages:
Optimized to accumulate data periodically at a larger scale
In-built data aggregatio...
Table of contents
Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Introduction
2. Relational Database Model and Database Design
3. Setting Up the Environment
4. ClickHouse SQL
5. SQL Functions in ClickHouse
6. SQL Functions for Data Aggregation
7. Table Engines â MergeTree Family
8. Table Engines â Log Family
9. External Data Sources
10. Special Engines
11. Configuring the ClickHouse Setup â Part 1
12. Configuring the ClickHouse Setup â Part 2
Appendix A: Installing Lubuntu 20.04 in Oracle Virtualbox 6.1
Appendix B: Installing External Data Sources
Index
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, weâve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Up and Running with ClickHouse by Vijay Anand R in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over 1.5 million books available in our catalogue for you to explore.