Graph databases have gained increasing attention in the last few years. Data models built from graphs bring together the simplicity of document-oriented databases and the clarity of SQL tables. Among others, Neo4j is a database that comes with a large ecosystem, including the database, but also tools to build web applications, such as the GRANDstack, and tools to use graph data in a machine learning pipeline, as well as the Graph Data Science Library. This book will discuss those tools, but let's first start from the beginning.
Talking about graph databases means talking about graphs. Even if you do not need to know all the details about graph theory, itās always a good idea to learn some of the basic concepts underlying the tool you are using. In this chapter, we will start by defining graphs and giving some simple and less simple examples of graphs and their applications. We will then see how to move from the well-known SQL tables to graph data modeling. Weāll conclude by introducing Neo4j and its building blocks, and review some design principles to understand what can and canāt be done with Neo4j.
This chapter will cover the following topics:
- Graph definition and examples
- Moving from SQL to graph databases
- Neo4j: the nodes, relationships, and properties model
- Understanding graph properties
- Considerations for graph modeling in Neo4j
Graph definition and examples
The question you may ask at this point is "Why should I care about graphs? After all, my company/business/interest is not about graphs or networks of any kind. I know my data model, well arranged into SQL tables or NoSQL documents, and I can retrieve the information I want when I want." This book will teach you how to empower your data by looking at it in a different way. Surprisingly enough, graphs can be used to model a lot of processes, from the more obvious ones such as road networks, to less intuitive use cases such as video games or credit card fraud detection, among many others.
Graph theory
Let's start from the beginning and answer the question "What is a graph?"
A bit of history: the Seven Bridges of Kƶnigsberg problem
Graph studies originate back to Leonhard Euler, a prolific Swiss mathematician who lived in the eighteenth century. In 1735, he published a paper proposing a solution to the Seven Bridges of Kƶnigsberg problem. The problem is the following:
Given the city whose geography is depicted in the following image, is there a way to walk across each of the seven bridges of the city once and only once, and return to our starting point?
As you can see, this city is crossed by a river that splits the city into two banks, A and B. The river meander additionally creates two islands, C and D, also part of the city. Those two banks and two islands are connected by a total of seven bridges: two bridges between A and C, two other bridges between C and B, one between C and D, one between B and D, and a last one between D and A:
Euler's reasoning (on the right side) was to reduce this complex geography to the most simple drawing, like the one you can see on the right of the previous image, since the route used within each island is not relevant. Each island then becomes a single point, or node, connected to another by one or several links, or edges, representing the bridges.
With this simple visualization, the mathematician was able to solve the initial problem by noting that, if you arrive at an island (vertex) via one bridge, you will need to leave it using another bridge (except for the start and end vertices). In other words, all vertices but two need to be connected to an even number of relationships. This is not the case in the Kƶnigsberg graph, since we have the following:
A: 3 connections (to C twice, and to D once)
B: 3 connections (to C twice, and to D once)
C: 5 connections (to A twice, to B twice and to D once)
D: 3 connections (to A once, to C once and to D once)
This kind of path, where each edge is used once and only once, is called a Eulerian cycle and it can be said that a graph has a Eulerian cycle if and only if all of its vertices have even degrees.
The number of connections for a node is called the degree of the node.
Graph definition
This leads us to the mathematical definition of a graph:
A graph G = (V, E) is a pair of:
- V, a set of nodes or vertices: the islands in our previous example
- E, a set of edges connecting nodes: the bridges
The Kƶnigsberg graph illustrated on the right of the preceding image can then be defined as follows:
V = [A, B, C, D]
E = [
(A, C),
(A, C),
(C, B),
(C, B),
(A, D),
(C, D),
(B, D)
]
Graphs, like many mathematical objects, are well defined. While it can be difficult to find a good visualization for some of those objects, graphs, on the other hand, suffer from the almost infinite number of ways to draw them.
Visualization
Apart from very special cases, there is no single way to draw a graph and visualize it. Indeed, graphs are most often an abstract representation of reality. For instance, all four graphs depicted in the following image represent the exact same set of nodes and edges, so, by definition, the same mathematical graph:
We cannot rely only on our eyes to find patterns within graphs. For instance, looking only at the lower-right plot, it would be impossible to see the pattern that is visible in the upper-right plot. That's where graph algorithms enter into the game, which will be discussed in more detail in Chapter 6, Node Importance, and Chapter 7, Community Detection and Similarity Measures.
Examples of graphs
Now that we have a better idea of what a graph is, it's time to discover some more examples to understand which purposes graphs can be useful for.
Networks
With the graph definition in mind (a set of nodes connected to each other via edges) and the bridges example from the last section, we can easily imagine ho...