Visualizing Graph Data
eBook - ePub

Visualizing Graph Data

Corey Lanum

Share book
  1. 232 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Visualizing Graph Data

Corey Lanum

Book details
Book preview
Table of contents
Citations

About This Book

Summary Visualizing Graph Data teaches you not only how to build graph data structures, but also how to create your own dynamic and interactive visualizations using a variety of tools. This book is loaded with fascinating examples and case studies to show you the real-world value of graph visualizations. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Assume you are doing a great job collecting data about your customers and products. Are you able to turn your rich data into important insight? Complex relationships in large data sets can be difficult to recognize. Visualizing these connections as graphs makes it possible to see the patterns, so you can find meaning in an otherwise over-whelming sea of facts. About the Book Visualizing Graph Data teaches you how to understand graph data, build graph data structures, and create meaningful visualizations. This engaging book gently introduces graph data visualization through fascinating examples and compelling case studies. You'll discover simple, but effective, techniques to model your data, handle big data, and depict temporal and spatial data. By the end, you'll have a conceptual foundation as well as the practical skills to explore your own data with confidence. What's Inside

  • Techniques for creating effective visualizations
  • Examples using the Gephi and KeyLines visualization packages
  • Real-world case studies


About the Reader No prior experience with graph data is required. About the Author Corey Lanum has decades of experience building visualization and analysis applications for companies and government agencies around the globe. Table of Contents

PART 1 - GRAPH VISUALIZATION BASICS

  • Getting to know graph visualization
  • Case studies
  • An introduction to Gephi and KeyLines

PART 2 VISUALIZE YOUR OWN DATA

  • Data modeling
  • How to build graph visualizations
  • Creating interactive visualizations
  • How to organize a chart
  • Big data: using graphs when there's too much data
  • Dynamic graphs: how to show data over time
  • Graphs on maps: the where of graph visualization

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Visualizing Graph Data an online PDF/ePUB?
Yes, you can access Visualizing Graph Data by Corey Lanum in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Visualisation. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Manning
Year
2016
ISBN
9781638352488

Part 1. Graph visualization basics

In part one of this book, we’ll take a high-level view of graphs. First, I’ll introduce you to what graphs are and how they can be used across a variety of domains, with some detailed case studies. Then, we’ll dive a little deeper into graph models of data, how they might be different from standard relational models of data, and how you can create graph data models from your data. I’ll introduce you to the two tools that we’ll use throughout the book: Gephi and KeyLines. I’ll use both Gephi and KeyLines in later chapters to illustrate how you can create graph visualizations of your own—for you own use, with Gephi, or as part of a visualization application, using KeyLines.

Chapter 1. Getting to know graph visualization

This chapter covers
  • Getting to know graphs as data models
  • Why graphs are a useful way to think about data
  • When to visualize graphs, and the node-link drawing concept
  • Other visualizations of graph data and when they’re useful
In December 2001, the Enron Corporation filed for what was at the time the largest ever corporate bankruptcy. Its stock had fallen from a high of $90 per share the previous year to $0.61, decimating its employees’ pensions and shareholders’ investments in it. The FBI’s investigation into this collapse became the largest white-collar criminal investigation in history as they seized over 3,000 boxes of documents and 4 terabytes of data. Among the information seized were about 600,000 emails between key executives at the organization. Although the FBI took pains to read every email individually, the investigators recognized that they were unlikely to find a smoking gun—people committing complex financial fraud seldom disclose their actions in written form. And in 2001, emails were only starting to become the primary means of internal communications; lots of information was still exchanged via phone calls.
In addition to looking at the text of individual emails, the FBI also wanted to uncover patterns in the communications, perhaps in an attempt to better understand who the decision makers were within Enron or who had access to a lot of the information internal to the company. To do this, they modeled the Enron emails as a graph.
A graph is a model of data that consists of nodes, which are discrete data elements (such as people), and edges, which are relationships between nodes. The graph model brings to the forefront relationships that may be hidden in tabular views of the same data and illustrates what is most important. By making those relationships between the data elements a core part of the data structure, you can identify patterns in the data that wouldn’t otherwise be apparent. But building graph data structures is only half the solution to pattern recognition. This book will teach you how to visualize graphs using interactive node-link visualization diagrams, and by the end, you’ll be able to create your own dynamic, interactive visualizations using a variety of tools available today.
In this chapter, I’ll go a little deeper into the concept of a graph and graph history and uses, and talk about various techniques used to visualize graph data. Subsequent chapters build on this framework by introducing concrete examples of graph visualizations and the data they’re based on and discuss various techniques for creating useful visualizations.

1.1. Getting to know graphs

Graphs are everywhere. As long as you’re interested in how items can be related to each other, there’s a graph somewhere in your data. In this section, I’ll walk you through what a graph is and what can be gained from visualizing graphs.

1.1.1. What is a graph?

As described previously, a graph—also called a network—is a set of interconnected data elements that’s expressed as a series of nodes and edges.
In the common definition of a graph, edges have exactly two endpoints, no more. In some cases, those two endpoints can be the same node if a node links to itself. An edge (also known as a link) can take one of two forms:
  • Directed— The relationship has a direction. Stella owns the car, but it doesn’t make sense to say the car owns Stella.
  • Undirected— The two items are linked without the concept of direction; the relationship inherently goes both ways. If Stella is linked to Roger because they committed a crime together, it means the same thing to say Stella was arrested with Roger as it does to say Roger was arrested with Stella.
In figure 1.1, you see an example of a directed link with properties.
Figure 1.1. A property graph of a single email between Enron executives. The two nodes are the sender and recipient of the email, and the directed edge is the email.
Both nodes and edges can have properties, which are key-value pairs—lists of properties and values, describing either the data element itself or the relationship. Figure 1.2 is a simple property graph showing that Stella bought a 2008 Volkswagen Jetta in September 2007 and sold it in October 2013. Modeling it as a graph highlights that Stella had a relationship with this car, albeit temporarily.
Figure 1.2. A simple property graph with two nodes and an edge. Stella (the first node) bought a 2008 Volkswagen Jetta (the second node) in September 2007 and sold it in October 2013. Modeling it as a graph highlights that Stella had a relationship with this car (the edge).
An email is a relationship, too, between the sender and the recipient. The properties of the nodes are things like email address, name, and title, and the properties of the relationship are the date/time it was sent, its subject line, and the text of the email.
To prove conspiracy, the FBI was interested in all the emails sent among the Enron executives, not just a single one, so let’s add some more nodes to represent a larger number of emails sent during a specified period of time, as shown in figure 1.3.
Figure 1.3. A graph of some of the Enron executives’ email communications. You can easily see that Timothy Belden is a hub of communication in this segment of Enron, sending and receiving email from many other executives.
Figure 1.3 is a directed graph because it matters whether Kevin Presto sent an email to Timothy Belden or received one—there’s a big difference between sending and receiving information when you’re investigating who knew what when. The arrowheads on the edges show that directionality: Kevin Presto sent an email to Timothy Belden, but Timothy Belden didn’t reply, indicating they may not have been close associates or they may have spoken offline. As we start to add more data to the graph, you can see the value of graphs—patterns become apparent. In this example, we can easily see that Timothy Belden is a hub of communication in this segment of Enron, sending and receiving email from many other executives.

1.1.2. A bit of theory

Graph theory began early in the eighteenth century with the Seven Bridges of Königsberg problem. In Königsberg, Prussia (now Kaliningrad, Russia), it was a common parlor game to try to determine a route that would allow someone to pass over all seven bridges over the Pregel River exactly once without passing over any bridge twice. (Go ahead and give it a shot using the map of the city, shown in figure 1.4, and see if you can prove three centuries of mathematicians wrong.)
Figure 1.4. The Seven Bridges of Königsberg problem. Using this map of the bridges of Königsberg, Prussia, try to draw a route that reaches each area of the city but never crosses the same bridge twice.
Leonhard Euler proved this problem unsolvable by abstracting the regions of the city into individual points and the bridges as paths between those points, as you can see in figure 1.5.
Figure 1.5. Seven bridges and four land areas of Königsberg as a graph. In this graph, nodes denote the land masses bordering the Pregel River and the two islands in its middle. Edges represent the bridges connecting the two islands and two shorelines.
E...

Table of contents