Another look at RAVE

I will be leading a 5 hour session on RAVE at the Schaumburg IBM Offices near Chicago August 12, 2015. Space is limited, so make sure to register early and often! Details can be found here: http://events.r20.constantcontact.com/register/event?oeidk=a07eb54blte62fd6806&llr=nuulfnqab

IBM released their new visualization engine quite some time ago, and great strides have been made in improving the usability. Sadly the documentation is still as opaque as ever, and there’s outside of the IBM Visualization forum, there’s not much support.

I’ve recently had the opportunity to work with it a great deal as my current client has some very interesting requirements. This article will actually serve as the basis of a talk I’ve been asked to give to the other developers at this client. I am planning on a series of articles on RAVE, (as a quick side note, I find simply writing about methodologies teaches me just as much as reading about them – everyone should stop what they’re doing and start their own blogs.) and this may recap some of what I’ve written previously.

To begin, let’s talk about the language behind RAVE, vizJSON.

As you can see from the name, vizJSON is based on JSON (JavaScript Object Notation). The basic structure is:

1. Objects: {}
2. Arrays: []
3. Key/Value Pairs: “Key”:”Value”

The value of a key can be a number, a string, an array, or an object. An array can contain one or more values or objects. With the exception of the vis object itself, every value must have a key.

An example:

"label":[{"content":["The range is "]},{"content":[{"$ref":"start"}," – ", {"$ref":"end"}],"style":{"font":{"weight":"bold"}}}]

This is actually much less complicated than it looks at first glance.

To begin, this is an element label. It will appear on or near the element to which is attached. The label is composed of two parts (notice the array attached to label contains two objects). The first object contains a single key, content, which itself is an array containing a single value. The second object contains two keys, content and style. This content will render values found in the dataset (which I’ll explain later), and has a style applied to it.

The entire graph is contained within an object. Each object type has a specific schema. For example, the data object won’t expect an axis object, but a dimension will. At the same time, some objects, such as style, can be associated with many different parents.

At the top level, there are three essential objects. Without any one of these, no graph will be displayed.

1. Data
2. Grammar
3. Version

The other graph objects include: parameters, legends, legendPosition, size, style, and copyright. Parameters allow report developers to modify the appearance of the graph at run time. Legend and legendPosition control, surprise surprise, the legend. Size and style control the size and formatting of frthe graph. And copyright allows me to sue anyone who visualizations without first paying me a king’s ransom.

RAVE, vizJSON, has gone through many revisions. Certain attributes have changed – features have been added and removed. Version allows the engine to properly identify which version of vizJSON the visualization has been written for, and will behave accordingly.

Data

The data array controls the datasets available to the visualization. Each dataset is represented in Cognos as a separate query. This gives us a flexibility unseen before. Previously graphs could only be built off a single query – necessitating inefficient queries for the more complex graphs.

An example of this might be a bullet or a thermometer chart embedded in a list. There are two sections; the first section controls the scale of the graph, the second controls the position for a notch. In a standard Cognos graph, we would need a complex query to get the red/yellow/green bands and the notch position.

horizontal themometer

In this graph, we show various bands, a target, and an actual. What’s not immediately visible is the first band does not start at 0. This graph is actually possible to accomplish using standard Cognos Graphs, but it is an ugly solution involving all sorts of trickery. Because it has to be a single query, a case statement is used. Additionally, because this is a master/detail, the score band query cannot be cached, and the tables are queried for each detail graph.

By using RAVE, we can set up two data sets. The first pulls the score band data. We pull score band label, start, and end. If we are using a single score band tier across the entire m/d list, this query can be cached.

The data set looks like:

      {
         "id":"zones",
         "fields":
         [
            {
               "id":"series",
               "label":"abc",
               "categories":
               [
                  "red",
                  "yellow",
                  "green"
               ]
            },
            {
               "id":"start",
               "label":"start"
            },
            {
               "id":"end",
               "label":"end"
            }
         ],
         "rows":
         [
            [0,160,359],
            [1,360,559],
            [2,560,760]
         ]
      }

The first field in that dataset is categorical. In the sample data, it’s populated with red,yellow,green. The rows section (containing the actual data), refers to the sample categories using a zero-based index. Once these appear in Cognos, the report developer will not have to worry about this.

The second data set is to create the dot. A single value can be called.

      {
         "id":"scores",
         "fields":
         [
            {
               "id":"score"
            }
         ],
         "rows":
         [
            [460]
         ]
      }

A quick call to the fact, with no joins, will be significantly faster than trying to join the two queries.

Grammar
Once we have the sample data to work with, the grammar describes the graphs. Grammar is an array, so we can actually have more than one graph in a single visualization. We can control the position of the different graphs – they can be next to each other vertically or horizontally, they can overlap, they can even be nested. Imagine a tree map nested inside the bars of a waterfall.

Inside a grammar we have coordinates, and elements. The coordinates is an object that holds the dimensions. The dimensions attribute is an array of edges for a specific graph. While it’s easy to imagine a two-dimensional graph, what can we do with more?

axes

In this visualization, we have an interval with three dimensions. We have the X Axis (measures, 1st dimension), the Y Axis (the categories, 2nd dimension), and the Z Axis (the series, 3rd dimension).

A graph need not have any axes. While frowned upon by data visualization experts, we can build a graph by showing packed bubbles, encoding a value by size and color.

bubbles

This graph has no axes – no dimensions. Instead, it’s positioning the bubbles using a specific method. There are a number of positioning algorithms. From positioning or nesting in any number of dimensions, to figuring out how to place a node when there are no dimensions.

The scale of a graph is defined by the element tied to it. A scale always has at least one span, you can add more to show, maybe, a line over a bar.
A dimension is often most useful when we have ticks and labels describing various points on the axis. In the clustered interval graph above, two out of three axes had labels. In vizJSON, each dimension has one or more scales, and an axis line.

There are various settings to control the axis line and ticks. If it’s a categorical scale, such as the Y and Z Axes above, do you want to show labels? How often do you want to show the measure for the Y Axis? In the previous example, the series was encoded by color, so there was no need to place labels.

Now let’s talk about the shapes themselves – bars, lines, circles, polygons, and more. While an interval is not much use in scatterplot, points can be placed almost everywhere. Some seem less obvious. What would you call the shape in a treemap? Or a heatmap? Or the paths between network nodes? A single basic element type can wear many hats.

Let’s talk about a two basic shapes, intervals and points.

An interval will build a rectangle from one point to another. The most basic interval generates the rectangle from the position of the field assigned to it.

Let’s take a dataset:

{
	{
	   "id":"series",
	   "label":"abc",
	   "categories":
	   [
		  "abc",
		  "abc",
		  "abc"
	   ]
	},
	{
	   "id":"size",
	   "label":"abc"
	},
	{
	   "id":"end",
	   "label":"abc"
	}
 ],
 "rows":
 [
	[  0,0,15,22	],
	[  0,1,17,23	],
	[  0,2,16,24	],
	[  1,0,24,34	],
	[  1,1,22,27	],
	[  1,2,26,33	],
	[  2,0,28,43	],
	[  2,1,16,34	],
	[  2,2,18,22	]
 ]
}

Apply an interval element

"type":"interval",
"position":
[
  {	 "field":{"$ref":"size"}  },
  {	 "field":{"$ref":"categories"}  },
  {	 "field":{"$ref":"series"}  }
]

That interval is built inside a grammar with three dimensions. It will attempt to build a rectangle, starting at 0 to the value of “size”, split at the positions for “categories” and “series”. There’s a little more to it than here, there are three dimensions listed, and the graph has the “cluster” transform applied to it.

interval

Interval also allows us to build a rectangle based on start and end values.

      "type":"interval",
"position":
[
  {	 "field":{"$ref":"start"}  },
  {	 "field":{"$ref":"end"}  },
  {	 "field":{"$ref":"categories"}  },
  {	 "field":{"$ref":"series"}  }
]

In general bars should begin at zero. 40 sales are obviously twice as many as 20. But is 40c twice as hot as 20c? Setting start and end ranges is useful when building thermometer graphs. We can use the interval to build the structure, the ranges, as the base for another indication.

range

Where interval describes a range or area, points usually describe position – with the caveat that size, shape, color can all be modified based on other criteria.

Let’s shake things up, we’ll go with a dataset that returns score bands like I described at the beginning of the article, and another dataset that returns a single score value.

   "data":
   [
      {
         "id":"scores",
         "fields":
         [
            {"id":"score"}
         ],
         "rows":
         [
            [460]
         ]
      },
      {
         "id":"zones",
         "fields":
         [
            {
               "id":"series",
               "label":"abc",
               "categories":
               [
                  "red",
                  "yellow",
                  "green"
               ]
            },
            {
               "id":"start",
               "label":"start"
            },
            {
               "id":"end",
               "label":"end"
            }
         ],
         "rows":
         [
            [0,160,359],
            [1,360,559],
            [2,560,760]
         ]
      }
   ]

With those two datasets, we need to find a way to build the ranges, and put a dot over the correct position.

In this case our grammar will contain two elements, an interval and a point. Remember how the interval can have start and end? This is easy. We can also apply a simple palette, changing based on the series.

               {
               "type":"interval",
               "position":
               [
                  { "field":{"$ref":"start"}},
                  { "field":{"$ref":"end"}},
                  { "value":"35%"}
               ],
               "color":
               [
                  {
                     "id":"_PALETTE_",
                     "field":
                     {
                        "$ref":"series"
                     },
                     "modifies":"both",
                     "palette":
                     [
                        "#BC202E",
                        "#F49020",
                        "#38B14A"
                     ]
                  }
               ],
               "style":
               {
                  "width":"8px"
               }
            },

The color attribute is as aesthetic, basically a conditional formatting. Almost any style can be changed on the fly. I’ll go into more detail on this in a later article. The style here is defining the size of the bar. In this case, because it’s feeding a pixel perfect report, it’s set at exactly 8 pixels wide. The big gotcha here is the “value”:”35%”. In this graph, even though there is a single bar, there are two dimensions. The “value”:”35%” is simply instructing RAVE to push the element 35% down the scale, with 100% being the top and 0% being the bottom.

vert therm 1

This alone will create a nice bar. Now let’s get the dot on it.

             {
               "type":"point",
               "position":
               [
                  {"field":{"$ref":"score"}},
                  {"value":"35%"}
               ],
               "style":
               {
                  "width":"9px",
                  "height":"9px",
                  "outline":"white",
                  "fill":"black",
                  "stroke":
                  {
                     "width":"1px"
                  }
               },
               "label":
               [
                  {
                     "content":
                     [
                        {
                           "$ref":"score"
                        }
                     ],
                     "style":
                     {
                        "align":"start",
                        "font":
                        {
                           "weight":"bold",
                           "size":"24px",
                           "family":"Helvetica"
                        }
                     }
                  }
               ]
            }

This will create not only a dot, but a label. The dot is a black circle (the default symbol) with a white border. The label is defined to be aligned “start”, which in this case is left.
vert therm 2

This is close to the example I showed at the beginning, but there’s a lot more. In future posts I’ll write about summary functions (to get the beginning of the green range), sorting (so the report developer doesn’t need to sort it in the report), and transposing (so it’s horizontal instead of vertical).

We’ll also talk about parameters, so the developer can choose his own colors, and the bundle definition.

Attached is the vizJSON for the above graph. You’ll need the RAVE customizer which can be found here. Can you figure out how to transpose the graph? How about finding the max start value and putting a triangle under the bar?

vertical-thermometer.txt (875 downloads)