Another look at RAVE

July 23, 2015 | cognospaul

I will be leading a 5 hour session on RAVE at the Schaumburg IBM Offices near Chicago August 12, 2015. Space is limited, so make sure to register early and often! Details can be found here: http://events.r20.constantcontact.com/register/event?oeidk=a07eb54blte62fd6806&llr=nuulfnqab

IBM released their new visualization engine quite some time ago, and great strides have been made in improving the usability. Sadly the documentation is still as opaque as ever, and there’s outside of the IBM Visualization forum, there’s not much support.

I’ve recently had the opportunity to work with it a great deal as my current client has some very interesting requirements. This article will actually serve as the basis of a talk I’ve been asked to give to the other developers at this client. I am planning on a series of articles on RAVE, (as a quick side note, I find simply writing about methodologies teaches me just as much as reading about them – everyone should stop what they’re doing and start their own blogs.) and this may recap some of what I’ve written previously.

To begin, let’s talk about the language behind RAVE, vizJSON.

As you can see from the name, vizJSON is based on JSON (JavaScript Object Notation). The basic structure is:

1. Objects: {}
2. Arrays: []
3. Key/Value Pairs: “Key”:”Value”

The value of a key can be a number, a string, an array, or an object. An array can contain one or more values or objects. With the exception of the vis object itself, every value must have a key.

An example:

"label":[{"content":["The range is "]},{"content":[{"$ref":"start"}," – ", {"$ref":"end"}],"style":{"font":{"weight":"bold"}}}]

This is actually much less complicated than it looks at first glance.

To begin, this is an element label. It will appear on or near the element to which is attached. The label is composed of two parts (notice the array attached to label contains two objects). The first object contains a single key, content, which itself is an array containing a single value. The second object contains two keys, content and style. This content will render values found in the dataset (which I’ll explain later), and has a style applied to it.

The entire graph is contained within an object. Each object type has a specific schema. For example, the data object won’t expect an axis object, but a dimension will. At the same time, some objects, such as style, can be associated with many different parents.

At the top level, there are three essential objects. Without any one of these, no graph will be displayed.

1. Data
2. Grammar
3. Version

The other graph objects include: parameters, legends, legendPosition, size, style, and copyright. Parameters allow report developers to modify the appearance of the graph at run time. Legend and legendPosition control, surprise surprise, the legend. Size and style control the size and formatting of frthe graph. And copyright allows me to sue anyone who visualizations without first paying me a king’s ransom.

RAVE, vizJSON, has gone through many revisions. Certain attributes have changed – features have been added and removed. Version allows the engine to properly identify which version of vizJSON the visualization has been written for, and will behave accordingly.

Data

The data array controls the datasets available to the visualization. Each dataset is represented in Cognos as a separate query. This gives us a flexibility unseen before. Previously graphs could only be built off a single query – necessitating inefficient queries for the more complex graphs.

An example of this might be a bullet or a thermometer chart embedded in a list. There are two sections; the first section controls the scale of the graph, the second controls the position for a notch. In a standard Cognos graph, we would need a complex query to get the red/yellow/green bands and the notch position.

horizontal themometer

In this graph, we show various bands, a target, and an actual. What’s not immediately visible is the first band does not start at 0. This graph is actually possible to accomplish using standard Cognos Graphs, but it is an ugly solution involving all sorts of trickery. Because it has to be a single query, a case statement is used. Additionally, because this is a master/detail, the score band query cannot be cached, and the tables are queried for each detail graph.

By using RAVE, we can set up two data sets. The first pulls the score band data. We pull score band label, start, and end. If we are using a single score band tier across the entire m/d list, this query can be cached.

The data set looks like:

      {
         "id":"zones",
         "fields":
         [
            {
               "id":"series",
               "label":"abc",
               "categories":
               [
                  "red",
                  "yellow",
                  "green"
               ]
            },
            {
               "id":"start",
               "label":"start"
            },
            {
               "id":"end",
               "label":"end"
            }
         ],
         "rows":
         [
            [0,160,359],
            [1,360,559],
            [2,560,760]
         ]
      }

The first field in that dataset is categorical. In the sample data, it’s populated with red,yellow,green. The rows section (containing the actual data), refers to the sample categories using a zero-based index. Once these appear in Cognos, the report developer will not have to worry about this.

The second data set is to create the dot. A single value can be called.

      {
         "id":"scores",
         "fields":
         [
            {
               "id":"score"
            }
         ],
         "rows":
         [
            [460]
         ]
      }

A quick call to the fact, with no joins, will be significantly faster than trying to join the two queries.

Grammar
Once we have the sample data to work with, the grammar describes the graphs. Grammar is an array, so we can actually have more than one graph in a single visualization. We can control the position of the different graphs – they can be next to each other vertically or horizontally, they can overlap, they can even be nested. Imagine a tree map nested inside the bars of a waterfall.

Inside a grammar we have coordinates, and elements. The coordinates is an object that holds the dimensions. The dimensions attribute is an array of edges for a specific graph. While it’s easy to imagine a two-dimensional graph, what can we do with more?

axes

In this visualization, we have an interval with three dimensions. We have the X Axis (measures, 1st dimension), the Y Axis (the categories, 2nd dimension), and the Z Axis (the series, 3rd dimension).

A graph need not have any axes. While frowned upon by data visualization experts, we can build a graph by showing packed bubbles, encoding a value by size and color.

bubbles

This graph has no axes – no dimensions. Instead, it’s positioning the bubbles using a specific method. There are a number of positioning algorithms. From positioning or nesting in any number of dimensions, to figuring out how to place a node when there are no dimensions.

The scale of a graph is defined by the element tied to it. A scale always has at least one span, you can add more to show, maybe, a line over a bar.
A dimension is often most useful when we have ticks and labels describing various points on the axis. In the clustered interval graph above, two out of three axes had labels. In vizJSON, each dimension has one or more scales, and an axis line.

There are various settings to control the axis line and ticks. If it’s a categorical scale, such as the Y and Z Axes above, do you want to show labels? How often do you want to show the measure for the Y Axis? In the previous example, the series was encoded by color, so there was no need to place labels.

Now let’s talk about the shapes themselves – bars, lines, circles, polygons, and more. While an interval is not much use in scatterplot, points can be placed almost everywhere. Some seem less obvious. What would you call the shape in a treemap? Or a heatmap? Or the paths between network nodes? A single basic element type can wear many hats.

Let’s talk about a two basic shapes, intervals and points.

An interval will build a rectangle from one point to another. The most basic interval generates the rectangle from the position of the field assigned to it.

Let’s take a dataset:

{
	{
	   "id":"series",
	   "label":"abc",
	   "categories":
	   [
		  "abc",
		  "abc",
		  "abc"
	   ]
	},
	{
	   "id":"size",
	   "label":"abc"
	},
	{
	   "id":"end",
	   "label":"abc"
	}
 ],
 "rows":
 [
	[  0,0,15,22	],
	[  0,1,17,23	],
	[  0,2,16,24	],
	[  1,0,24,34	],
	[  1,1,22,27	],
	[  1,2,26,33	],
	[  2,0,28,43	],
	[  2,1,16,34	],
	[  2,2,18,22	]
 ]
}

Apply an interval element

"type":"interval",
"position":
[
  {	 "field":{"$ref":"size"}  },
  {	 "field":{"$ref":"categories"}  },
  {	 "field":{"$ref":"series"}  }
]

That interval is built inside a grammar with three dimensions. It will attempt to build a rectangle, starting at 0 to the value of “size”, split at the positions for “categories” and “series”. There’s a little more to it than here, there are three dimensions listed, and the graph has the “cluster” transform applied to it.

interval

Interval also allows us to build a rectangle based on start and end values.

      "type":"interval",
"position":
[
  {	 "field":{"$ref":"start"}  },
  {	 "field":{"$ref":"end"}  },
  {	 "field":{"$ref":"categories"}  },
  {	 "field":{"$ref":"series"}  }
]

In general bars should begin at zero. 40 sales are obviously twice as many as 20. But is 40c twice as hot as 20c? Setting start and end ranges is useful when building thermometer graphs. We can use the interval to build the structure, the ranges, as the base for another indication.

range

Where interval describes a range or area, points usually describe position – with the caveat that size, shape, color can all be modified based on other criteria.

Let’s shake things up, we’ll go with a dataset that returns score bands like I described at the beginning of the article, and another dataset that returns a single score value.

   "data":
   [
      {
         "id":"scores",
         "fields":
         [
            {"id":"score"}
         ],
         "rows":
         [
            [460]
         ]
      },
      {
         "id":"zones",
         "fields":
         [
            {
               "id":"series",
               "label":"abc",
               "categories":
               [
                  "red",
                  "yellow",
                  "green"
               ]
            },
            {
               "id":"start",
               "label":"start"
            },
            {
               "id":"end",
               "label":"end"
            }
         ],
         "rows":
         [
            [0,160,359],
            [1,360,559],
            [2,560,760]
         ]
      }
   ]

With those two datasets, we need to find a way to build the ranges, and put a dot over the correct position.

In this case our grammar will contain two elements, an interval and a point. Remember how the interval can have start and end? This is easy. We can also apply a simple palette, changing based on the series.

               {
               "type":"interval",
               "position":
               [
                  { "field":{"$ref":"start"}},
                  { "field":{"$ref":"end"}},
                  { "value":"35%"}
               ],
               "color":
               [
                  {
                     "id":"_PALETTE_",
                     "field":
                     {
                        "$ref":"series"
                     },
                     "modifies":"both",
                     "palette":
                     [
                        "#BC202E",
                        "#F49020",
                        "#38B14A"
                     ]
                  }
               ],
               "style":
               {
                  "width":"8px"
               }
            },

The color attribute is as aesthetic, basically a conditional formatting. Almost any style can be changed on the fly. I’ll go into more detail on this in a later article. The style here is defining the size of the bar. In this case, because it’s feeding a pixel perfect report, it’s set at exactly 8 pixels wide. The big gotcha here is the “value”:”35%”. In this graph, even though there is a single bar, there are two dimensions. The “value”:”35%” is simply instructing RAVE to push the element 35% down the scale, with 100% being the top and 0% being the bottom.

vert therm 1

This alone will create a nice bar. Now let’s get the dot on it.

             {
               "type":"point",
               "position":
               [
                  {"field":{"$ref":"score"}},
                  {"value":"35%"}
               ],
               "style":
               {
                  "width":"9px",
                  "height":"9px",
                  "outline":"white",
                  "fill":"black",
                  "stroke":
                  {
                     "width":"1px"
                  }
               },
               "label":
               [
                  {
                     "content":
                     [
                        {
                           "$ref":"score"
                        }
                     ],
                     "style":
                     {
                        "align":"start",
                        "font":
                        {
                           "weight":"bold",
                           "size":"24px",
                           "family":"Helvetica"
                        }
                     }
                  }
               ]
            }

This will create not only a dot, but a label. The dot is a black circle (the default symbol) with a white border. The label is defined to be aligned “start”, which in this case is left.
vert therm 2

This is close to the example I showed at the beginning, but there’s a lot more. In future posts I’ll write about summary functions (to get the beginning of the green range), sorting (so the report developer doesn’t need to sort it in the report), and transposing (so it’s horizontal instead of vertical).

We’ll also talk about parameters, so the developer can choose his own colors, and the bundle definition.

Attached is the vizJSON for the above graph. You’ll need the RAVE customizer which can be found here. Can you figure out how to transpose the graph? How about finding the max start value and putting a triangle under the bar?

vertical-thermometer.txt (869 downloads)

10, 8.4, Framework, Internals, OLAP

Guest Post: On Cognos Reports Performance Improvement

June 7, 2015 | Nimrod (Rod) Avissar

One of the tasks I’m hired to do quite often is to analyse why a report (or a group of reports) is loading slowly, and how could this be improved. The good news are, when a report runs for a very long time, there is almost always something we can do to improve it. But first, we must analyse what causes the report to run long.

The first thing we want to do is to rule out the query as the culprit. That is, we want to see how long it takes to get the report data from the database. This is best done by getting someone who is well versed in SQL or MDX to write a query that will pull the relevant data from the database. Many BI developers have sufficient knowledge of SQL, but in case you don’t, get a DBA to help.

One might ask – why not just take the SQL Cognos generates, and run that against the database. The answer is that we are trying to figure out what Cognos does wrong, and that includes the query generation. We need to compare what Cognos does with a well structured query.

If your manual query takes a long time to give results, it means that the underlying database has a hard time supporting your query. There could be many reasons for that. First of all, check that your tables are properly indexed and, if necessary, partitioned. Again, developers with gaps in knowledge in terms of DB tuning will do well to consult a DBA. If you are aggregating large amount of rows, or creating many joins, maybe an aggregated table or a materialised (Indexed) view will solve the problem. If you are doing multiple aggregates based on a few million rows, perhaps a Dynamic Cube would be a good bet, or you could try building an OLAP cube to support your reporting needs (Almost any cubing solution would be better than Transformer, but if push comes to shove, Transformer could be better than nothing).

If your data source is OLAP, and a query takes long to come back, it probably means you might want to design your cube differently. The most important tip I can give you here is to avoid nesting in as much as possible – try to hierarchise the nested columns under one single hierarchy, it will also help with aggregates. A rugged hierarchy is often preferable to a full cross join between different hierarchies, even when nonempty is defined in the query.

If your manual query returns within a reasonable time, it’s time to compare it with the Cognos generated query (Obtain it from tools->Show generated SQL/MDX). Just skim through the Cognos query and see that there aren’t any major differences. If there are, it could be your framework model is problematic. Run the Cognos generated query against the DB – how long does it take to come back? If much longer, then your model is probably at fault. The most usual culprit in cases like these are ill defined relations.

If the query Cognos generates is fine, we’re probably looking at some processing done on the result set. There are several of those, the common ones are master-detail relationships, chart generation and crosstab generation. There are ways to improve master-detail relationships over CQM (Follow this link, and also make sure that you are sectioning by a single key field), but ultimately, a DQM model would optimise master-detail relationships better than any wizardry. Crosstabs and charts over relational are rendered by creating mini-cubes on the server side, these may take some processing – again, the best cure is DQM, or creating a dimensional data source. If you are rendering many charts, the charting engine will queue some of them – not a lot to be done here other than increasing number of connections to the Graphics Service, minimising the amount of charts by using matrix rows/columns where appropriate or by loading the serially, or creating a RAVE based chart that spans multiple charts together.

These are the basics. There are obviously many cases which aren’t covered here, but that should give you the first-aid steps. If you tried all this and your report still slugs, then it is time to call the cavalry in.

Nimrod (Rod) Avissar is a Cognos expert, with a penchant for specialized UX solutions. Feel free to drop me a line! (LinkedIn).

10, Javascript, Prompts

Guest Post: A New Take on Date Range Prompts

May 24, 2015May 27, 2015 | Nimrod (Rod) Avissar

Since version 10.2 of IBM Cognos BI Suite, IBM included an API to access and manipulate prompt objects. Since prompt objects are the main instrument we use to allow users to communicate with a report (Interactivity or user selection), being able to manipulate them however we see fit can change user experience dramatically for the better. There are countless examples of how the prompt API can be used to achieve this. For example, dynamic defaults: Suppose you have two prompts, for region and for products. You want the default product selected to be the best selling product in the region selected. With prompt API, this can be achieved easily.

In this post I’d like to showcase one of the first solutions I ever wrote using Prompt API, because it was one of the things I wanted to solve for a long time.

Every so often we add “from date” and “to date” prompts to a report, to use for filtering the report output to show only data from the date range selected. The problem is, most users and most use cases don’t require the sort of flexibility a date range offers: most users will not run their sales-per-branch report between April 23rd and May 2nd, for instance, because it’s an arbitrary chunk of dates. Instead, users are likely to filter dates for last month, this MTD, QTD, YTD, last week and so on. So, basically, set, standard, comparable time frames. And sometimes the date range prompt can be replaced with a drop down list of such pre-set ranges, but other times, users ask to still have the flexibility of choosing to and from date, but nonetheless, still mostly use the set, comparable ranges.

Now, in order to select last month’s dates with two date prompts, your average user will need 6 clicks: One to open from date calendar, one to page back to last month, one to click on “1”, and the same process with the “To date” prompt. For YTD, they might need more. That’s a lot of clicks. Also, developers often have to write scripts to get the default value right, and because these date prompts are never done in a centralised, reusable manner, they end up writing a script for each report. I have long fought the war on developers wasting time by doing things more than once, and this case is no different. Even if reports require different default times, the solution can still be generalised, and therefore made reusable.

My solution uses JavaScript and Prompt API to add to the date prompt functionality. Here is how it works:

I’m using two date prompts, and adding 7 pre-defined links, which, when clicked, fill in their respective dates. So, for example, clicking on MTD will set the from date prompt to the 1st of the current month, and the to date prompt to today’s date. There’s also a verification mechanism in place to ensure that from date is always earlier than to date, or equal to it.

But how do I make this solution generalised? Let’s take a look at the report studio report:

The bit in blue is the error message the user will get if they choose an end date that’s prior to the start date. The bit in blue is a text that should be replaced with another text – containing just one number between 1 and 7, corresponding with a dynamic ate range. “1” is YTD, 4 is WTD and so on.

Now, if you drag in a layout reference object to this interface, here’s what you’ll get:

You can override&replace the warning message and the default text. So, if the default for a certain report is “last month”, you’ll override “Defaults”

Drag in a text item and insert “5”

When you run the report, the default would be last month:

This way you can set a different default value for each report in a reusable manner.

I’m attaching the XML, of course, but pay attention to these caveats:

1. The script has seven preconfigured date ranges. You can change them or add to them as you require, and use the general example in the code, but it requires some knowledge of scripting. Unfortunately, I will not be able to provide support for such customisations.

2. If you’re relying on my script to manipulate weeks, pay attention that my script assumes Monday is the first day of the week. Israelis especially, this means you’ll have to change this logic (Weeks in Israel begin on Sundays).

3.This is 10.2.2 – You can downgrade it to 10.2.x by changing the version number at the top.

daterange.txt (9396 downloads)

Nimrod (Rod) Avissar is a Cognos expert, with a penchant for specialized UX solutions. Feel free to drop me a line! (LinkedIn).

CognosPaul

Cognos tips, Cognos tricks, Cognos techniques: Cognos, Cognos, Cognos!

Another look at RAVE

Guest Post: On Cognos Reports Performance Improvement

Guest Post: A New Take on Date Range Prompts