Loading 12 Votes - +

Automating Data Visualization with Ruby and Graphviz

I often have the need to visualize data from the enterprise resource planning system we use at work. For those not familiar, an ERP is a collection of servers and software that essentially create a common environment for people to monitor and plan a business. One of the downsides I’ve ran into with our ERP system is that the data within is huge, complex, and not easy to visualize.

An Illustrative Example of Complex Data

To better illustrate this visualization issue, consider a small manufacturing company. For a typical operation it’s not uncommon to deal with several thousand unique part numbers. For each part, there is what is known as “master data,” i.e., the data elements in the ERP system that tell you what that part is. For example, if you’re manufacturing jeans, part 12345 may be boot cut, black, button-fly, faded jeans. All of those attributes are stored in the master data set. Ensuring that you both define and maintain that master data is less than trivial and often times downright tedious.

Obtaining Raw, Structured Data

One thing that is possible with most any ERP is the ability to “dump” data. You can think of a data dump as a spreadsheet as most times this dump will be available in comma-separated value (CSV) format or something similar. If you ask the ERP for all items, it will simply return a line per item with the attributes listed in the columns. Let’s consider a mock dump using a jeans company as an example. Moreover, let’s keep things really simple and only consider eight total items.

Part # Description
001 Blue Jeans, button fly
002 Blue Jeans, zipper fly
003 Black Jeans, button fly
004 Black Jeans, zipper fly
005 Zipper
006 Button
007 Denim
008 Black Dye

Easy enough, right? How hard can eight items be? Let’s now consider the fact we’re a global company that deals with plants all over the globe. Luckily, for our eight items only three plants are involved.

Plant # City State # Country
1000 San Francisco California USA
1001 Dallas Texas USA
1002 Munich Bavaria Germany

So far so good, but here’s where it gets messy. In ERPs you have what are known as recipes. Much like you use a recipe to bake a batch of cookies, you use a recipe to produce a batch (or lot) of jeans. Recipes often list items in terms of end items and components, i.e., which parts get used to make an end item. Master data dumps at this point get cryptic. In our example we’ll show just five columns of data, but in a real world example they would be in the hundreds.

Recipe # End Item # End Item Plant of Mfg Component Item # Component Source Plant
900 001 1000 006 1002
900 001 1000 007 1001
901 002 1000 005 1002
901 002 1000 007 1001
902 003 1000 006 1002
902 003 1000 007 1001
902 003 1000 008 1001
903 004 1000 005 1002
903 004 1000 007 1001
903 004 1000 008 1001
904 001 1001 006 1002
904 001 1001 007 1001
905 002 1001 005 1002
905 002 1001 007 1001

Yikes! It’s starting to get impossible to easily keep track of what’s going on and we’re only dealing with 8 items, 3 plants, and 6 recipes. Now, imagine thousands of parts, plants, and items. Even worse, imagine your boss walking in the room and asking, “hey, is our master data correct for jeans?” Hopefully you’re starting to see the issue at hand. If not, you’re probably some kind of robot. Never fear though. You may not realize it yet, but your data is a diamond in the rough. All you need to do is visualize it.

Don’t Use a Purely Visual Presentation Tool for a Data Problem

If you feel yourself reaching for Microsoft PowerPoint or Visio, quickly find a mirror and haze yourself. While both of those programs can be used for good, you’ll only further complicate things if you use them in this situation to jump from raw data to visualization. Imagine you spend a week building charts and slides of your recipes. Then, as you’re basking in all of their graphical glory, you hear of an initiative to rework most of the recipes as they have been causing issues in the plants. That’s right, a week of work is now obsolete. So much for productivity.

The problem is that your data lives, breaths, and evolves. As such you need something between it and your slide shows. Enter Graphviz1 and the DOT language, which is free software that is oh-so-good at letting you visualize data that is graph-like in nature.

Understanding Graphviz and the DOT Language

Graphviz transforms a very simple language called DOT into amazingly well-rendered graphs based on a variety of layout algorithms. One of the best things about Graphviz is that it is very well documented.2 Rather than spend a lot of time describing the syntax, I’ll provide an easy example and leave it to you to read up on the details.

A typical Graphviz DOT file has 3 major sections:

  1. Header data that applies attributes to nodes, edges, and graphs
  2. Node definitions
  3. Edge definitions

Given this simplicity, it’s quite easy to get started once you’ve installed Graphviz.3 Start by opening a plain text editor and saving a file graph_example.gv. Next, paste in the following text and save. Note that this only includes 1 of the 3 sections mentioned above, the edge definitions.

digraph graph_example {
 a -> b
 b -> c
 c -> d
 b -> d
 a -> d
}

If you’re using Mac, simply double click on the gv file you just saved. If you’re using Windows, you need to generate an output image and open it. If you use something like Microsoft Picture Manager, open the image with that and it will update as you regenerate your graphic. Once open, you should have something like the following.

2_article_2319_thumb_graph_example_simple

A simple Graphviz-generated image.

Note how little syntax is required to get a functional, albeit simple, graph generated. In this case, all we’ve done is define edges and let Graphviz/DOT imply the rest. In our file we created edges between the nodes a, b, c, and d. Graphviz takes this DOT definitions and renders the graph you see according to its default settings (which we’ll soon begin tweaking).

Let’s begin adding some of the more powerful attributes available to us in the DOT language. In the same manner as before, use the following code to generate your next graph.

digraph graph_example {

 /***** GLOBAL SETTINGS *****/

 graph          [rotate=0, rankdir="LR"]
 node           [color="#333333", style=filled,
                 shape=box, fontname="Trebuchet MS"]
 edge           [color="#666666", arrowhead="open", 
                 fontname="Trebuchet MS", fontsize="11"]
 node           [fillcolor="#294b76", fontcolor="white"]
 
/***** Nodes *****/
 a              [label="Node A"]
 b              [label="Node B"]
 c              [label="Node C"]
 d              [label="Node D", fillcolor="#116611"]

 /***** Edges *****/
 a              -> b
 b              -> c
 c              -> d
 b              -> d
 a              -> d [label="direct path"]
}

This code should produce the following.

2_article_2319_thumb_graph_example

A Graphviz-generated image demonstrating a few features in the DOT language.

You can connect the dots (no pun intended) and figure out what’s driving the various changes as compared to our original graph.4 Note the three sections in our code: graph, nodes, and edges. As long as you see the distinction between each, you’ll find DOT very easy. It should be noted that the lines in the global section can be placed anywhere in the file. For example, if after defining the first 2 nodes I decide that I want every remaining node to be red, I can re-define “node” in the same manner I did in the global section above. It will only apply to the lines below it.

Great Output, Verbose Input

You’re probably already thinking about how complex the input file is going to be when we get back to our ERP example. This is an issue, i.e., it’s just as painful to generate a Graphviz file for complex data as it is to deal with the complex data itself. Never fear though, we’re going to automate the nastiness.

Generating Graphviz Images with Ruby

We now have two very systematic things at our disposal:

  1. structured data – we know how this will look with every extraction from our ERP system
  2. an appropriate language – we know our language listens and speaks in ways very similar to our structured data set

To put these two items to work, we’re going to employ a Ruby gem called ruby-graphviz to allow us to parse the data dumps we’ve reviewed and auto-generate graphical files. Moreover, we can generate the DOT language files that we can further manipulate/archive if necessary. Lucky for us, both of these are extremely simple once we write a few lines of Ruby to do our dirty work.

require 'rubygems'  # allows for the loading of gems
require 'graphviz'  # this loads the ruby-graphviz gem

# Constants defining file names and paths
CSVRecipeFileName   = "recipes.csv"
CSVRecipeFilePath   = File.join(".")
CSVMaterialFileName = "materials.csv"
CSVMaterialFilePath = File.join(".")
CSVPlantFileName    = "plants.csv"
CSVPlantFilePath    = File.join(".")
OutputPath          = File.join(".")

# load 3 arrays with CSV data
recipes = []; materials = []; plants = []
[[CSVRecipeFileName,   CSVRecipeFilePath,   recipes], 
 [CSVMaterialFileName, CSVMaterialFilePath, materials],
 [CSVPlantFileName,    CSVPlantFilePath,    plants]].each do |array|
   File.open(File.join(array[1], array[0])).each do |line| 
     array[2] << line.chop.split(",")
   end
end

# initialize new Graphviz graph
g = GraphViz::new( "structs", "type" => "graph" )
g[:rankdir] = "LR"

# set global node options
g.node[:color]    = "#ddaa66"
g.node[:style]    = "filled"
g.node[:shape]    = "box"
g.node[:penwidth] = "1"
g.node[:fontname] = "Trebuchet MS"
g.node[:fontsize] = "8"
g.node[:fillcolor]= "#ffeecc"
g.node[:fontcolor]= "#775500"
g.node[:margin]   = "0.0"

# set global edge options
g.edge[:color]    = "#999999"
g.edge[:weight]   = "1"
g.edge[:fontsize] = "6"
g.edge[:fontcolor]= "#444444"
g.edge[:fontname] = "Verdana"
g.edge[:dir]      = "forward"
g.edge[:arrowsize]= "0.5"

# draw our nodes, i.e., plants
plants.each do |plant|
  g.add_node(plant[0]).label = plant[1]+"\\n"+
    plant[2]+", "+plant[3]+"\\n("+plant[0]+")"
end

# brute force, but simple function to find a material record by number
def find_material_record(materials_array, material_number)
  materials_array.collect{|m| m[1] unless m.index(material_number).nil?}.compact[0]
end

# connect the nodes with our recipes, add labels from material data
recipes.each do |recipe|
  end_material       = find_material_record(materials, recipe[1])
  component_material = find_material_record(materials, recipe[3])
  g.add_edge(recipe[4],recipe[2]).label=
    component_material+" for\\n"+end_material+"\\n(Recipe "+recipe[0]+")"
end

g.output( "output" => "png", :file => OutputPath+"/graph.png" )

And that’s it. A few dozen lines of reusable code now gives us a script that will generate visual representations of our data like the one below which uses the data from the tables at the beginning of the article. Tweaking that last line will allow us to not only allow for the generation of graphic files, but also of text DOT files.5

2_article_2319_thumb_graph

Conclusion

Obviously, this is a trivial example, but you get the idea. If you have access to a plotter, Graphviz gives you a great deal of flexibility with regards to how big these images can physically be. Regardless of whether you print or view your output on-screen, the benefit is clear. They say a picture is worth a thousand words, but in this case, it’s worth a few thousand lines of raw data.

The nice thing about the Ruby integration is that you can allow for a variety of options to tweak the output you get for the same data. In our example, maybe we only want to see which parts move between which plant. In that case, the recipe isn’t important and we can consolidate many of our edges. Moreover, if you’re a Rails user, slapping this in a Rails app is trivial and then you can give many users web access to this information and leave the data extraction and processing to a background process and the code we’ve just written.

If you find this useful, or if you know of other/better ways to visualize complex data sets, I’d love to hear about them. Whatever path you choose (hopefully you choose the shortest path first), good luck in your visualization endeavors!

Notes

1 Graphviz is freely available from http://www.graphviz.org.

2 Graphviz documentation is available at http://www.graphviz.org/Documentation.php. Of particular note are the DOT reference and the Node, Edge, and Graph Attribute references.

3 Installation varies between platforms. See the notes on the Graphviz download page for the details pertaining to your platform.

4 All of the attributes used are documented at http://www.graphviz.org/doc/info/attrs.html.

5 Documentation for ruby-graphviz can be obtained from the project page by downloading the latest package and reading the contained files. http://raa.ruby-lang.org/project/ruby-graphviz/. Accessed 19 January 2009.

Similarly tagged OmniNerd content:

Thread parent sort order:
Thread verbosity:
3 Votes  - +
Nice article by podvent

Good to have this input!
Graphviz is wonderful, works great to generate hierarchy display on the fly.
I tried to extract data from my existing db to generate a graph for
upline → downline relation, it works great and fast.

Before,I used to do it using a vb script inside ms ppt and from there generate an org chart.

Tnx for all the input!
Keep it UP!

Hi,
I found out that graphviz fails on large data sets.

Try out the GUESS project instead. It uses java, allows zooming, and allows dynamic scripting of data with python.

GUESS project homepage

0 Votes  - +
BA by Anonymous

Until I learnd2BA I would have done similarly, but now I’d definitely go with a BA package (after investigation I chose Pentaho [http://www.pentaho.com])
Basic steps are:
- install analysis server
- load your denormalized data
- define the cube (pentaho has the schema workbench for this)
- analyze!

0 Votes  - +
Hypergraph by Anonymous

Hooray that ridiculous language ruby can finally do graphviz. Check out http://hypergraph.sourceforge.net/examples.html

0 Votes  - +
Graphviz + Perl by Anonymous

Your examples don’t really address “gigantic” data. I generated about 1000 Graphviz nodes automatically using Perl in 2000 AD. See http://groups.google.com/group/perfviz?hl=en

So, Graphviz can handle large data sets (in that sense, at least) and if you produce the output as PDF, you can scale it visually.

I wrote a tool called AfterGlow which takes a CSV file that defines edges and generates a DOT description of the graph. In addition to the CSV input you can use a configuration file that allows for the decoration of nodes, aggregations, filtering, etc.
If you want to see what people have done with AfterGlow, check out SecViz.org

I had some good experience with the Ruby Graph Library . It’s more about graphs and uses graphviz (i.e. dot/dotty) as an output method. Definitely not a full interface to graphivz, but very usefull for simple visualisations.

I wrote Tk::GraphViz for perl to render interactive graphs in a perl/Tk canvas. I’ve wanted the same for ruby/Tk, but haven’t taken the time to do it yet. Thanks for the reminder of graphviz goodness.

You might be interested in this Visio addin to layout diagrams with Graphviz:

http://www.calvert.ch/graphvizio/

Best regards,
Maurice Calvert

Share & Socialize

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

America's involvement with the ISIS crisis should be?

2 votes, 0 comments