Graphs can be effectively used to model and solve routing problems through road and public transit networks. Graphs can be designed to model and predict financial transactions and even complex social networks (yeah, blame a graph algorithm the next time Facebook or LinkedIn makes an unfamiliar or unsolicited friend suggestion or professional connection). Despite its versatility, the graph universe is made up of just two simple, easily relatable components, namely, nodes and edges. In a road network, a node might represent a road intersection and an edge might very well represent the road segment itself. The convention is that an edge is an entity that always connects two nodes, as is represented in the following diagram:
Let's fire up a new Google Colab Notebook and build our first graph using a Python library known as networkx. By default, the networkx library is installed in the Google Colab environment. If not, be sure to install it using pip or conda or a similar package manager. The following command should work in most Jupyter Notebook coding environments:
The networkx library provides a clean and efficient data structure to define and work with graphs. The simplicity of the networkx library is the most appealing factor for adopting this library for this chapter. Let's get started with networkx and create our first graph with just four lines of code:
These edges in the preceding graph didn't have a direction. But edges can have a direction. If you think of the road network analogy, there are one-way roads, in which you can only drive along in one direction, but most roads are bidirectional:
One way roads
This can be modeled in networkx by instantiating a directional graph (digraph). In a digraph, the position of the nodes in the edge definition determines the direction of the edge. The convention is the first node in the edge definition is the from node or source node and the second node is the to node or target node or destination node. Let's instantiate a directional graph with the following code snippet:
H = nx.DiGraph()
H.add_edge('B', 'A')
H.add_edge('B','C')
Plotting the digraph, H, yields the following plot. Notice the arrows in the diagram indicating the direction of the edge:
A directed graph
The preceding lines of code mean that there are three nodes, 'A', 'B', and 'C', and there's a connection between B and A and not the other way around. It also means that there's a connection between B and C, but not between C to B. Digraphs are very important in modeling real-world networks, especially road networks.
If a road segment is bidirectional, you might have to add two different edges between the same nodes, as follows:
I = nx.DiGraph()
I.add_edge("A", "B")
I.add_edge("B", "A")
I.add_edge("A", "C")
In the following plot, notice the arrows. Edge AB has two arrows pointing in opposite directions:
A bidirectional graph
In the previous examples, each edge is considered to have a unit weight. What that means is that the cost of traveling from one node to another through an edge is the same. This need not always be the case. In the case of a road network, each road segment is different from each other in terms of length and time (taken to traverse it). So, if we are going to represent these road segments as edges, we need to make sure that the edges have different costs or weights.
The networkx library allows us to add weight to an edge, which is demonstrated in the code, as follows:
import networkx as nx
#Create a weighted graph
G=nx.DiGraph()
G.add_edge('A','B',weight=6)
G.add_edge('A','C',weight=2)
G.add_edge('C','D',weight=4.5)
G.add_edge('C','E',weight=5)
G.add_edge('C','F',weight=6)
G.add_edge('A','D',weight=3)
The output will be as follows:
A weighted graph with the width of edges representing the weight of the edge
In this section, we had a good introduction to the components and types of graph data structures. In the next section, we will look into the popular analyses commonly performed using graphs.
Suppose you want to connect to Barack Obama through LinkedIn; how many degrees of connection do you have to go through to reach Obama? A first-degree connection is someone who is connected to you on LinkedIn. A second-degree connection is someone who is connected to your first-degree connection and so on. Assuming that each of your connections and their connections respectively are interested and ready to help you network with the Obama, research says that it only takes an average of 5 degrees of connection for you to connect with Obama, or anyone in the world, for that matter. In other words, the shortest path between any of us and Obama is less than or equal to five. That soun...