IBM Think! (And dimensional report design)

I haven’t been posting much recently, and you can always tell how busy I am by the frequency of my posts. Never fear, the crazy project I’m on will be ending soon and I can get back to posting semi-regularly.

Before anything else, a very important announcement.

I’ll be attending IBM Think! Do you want to meet me? Tell me how awesome I am and how my solutions have saved your life? Now’s your chance! Send a message to the PMsquare people here and we can set something up! Want to tell me that one of my solutions destroyed your Cognos set up? We can talk it over a beer or two in a well lit public area. During the expo, I’ll either be hanging around booth #716, or wandering around restocking my supply of branded pens and office swag.

Now to the meat of the post.

Let’s talk about dimensional report design.

Building dimensional reports is more of the more complex tasks a Cognos developer can face. Relational reports over a few tables is an easy task, but the skills learned don’t necessarily transfer over. While similar in appearance, the queries use a very different style.

When building a query, it might be tempting to explicitly create a tuple for each item. Case in point, consider the following requirement:

year in column, prior year, Year over Year%
4 measures
Org Hierarchy in rows
desired output

It might be tempting to make each column a separate tuple.

currentyearSales: tuple([Sales],[Current Year])
priorYearSales: tuple([Sales],[Prior Year])
and so on.

But let’s take a look at the underlying MDX that’s generated.

WITH
  MEMBER [Measures].[XQE_V5M_Reseller Order Quantity YoY_CM13] AS ((([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Reseller Order Quantity])-([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity]))/([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity])), SOLVE_ORDER = 4, CAPTION = 'Reseller Order Quantity YoY'
  MEMBER [Measures].[XQE_V5M_Reseller Sales Amount YoY_CM14] AS ((([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Reseller Sales Amount])-([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Reseller Sales Amount]))/([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Reseller Sales Amount])), SOLVE_ORDER = 4, CAPTION = 'Reseller Sales Amount YoY'
  MEMBER [Measures].[XQE_V5M_Internet Order Quantity YoY_CM15] AS ((([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Internet Order Quantity])-([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity]))/([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity])), SOLVE_ORDER = 4, CAPTION = 'Internet Order Quantity YoY'
  MEMBER [Measures].[XQE_V5M_Internet Sales Amount YoY_CM16] AS ((([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Internet Sales Amount])-([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Sales Amount]))/([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Sales Amount])), SOLVE_ORDER = 4, CAPTION = 'Internet Sales Amount YoY'
  MEMBER [Measures].[XQE_V5M_Reseller Order Quantity 2012_CM17] AS ([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity]), SOLVE_ORDER = 4, CAPTION = 'Reseller Order Quantity 2012'
  MEMBER [Measures].[XQE_V5M_Reseller Sales Amount 2012_CM18] AS ([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Reseller Sales Amount]), SOLVE_ORDER = 4, CAPTION = 'Reseller Sales Amount 2012'
  MEMBER [Measures].[XQE_V5M_Internet Order Quantity 2012_CM19] AS ([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Order Quantity]), SOLVE_ORDER = 4, CAPTION = 'Internet Order Quantity 2012'
  MEMBER [Measures].[XQE_V5M_Internet Sales Amount 2012_CM20] AS ([Date].[Calendar].[Calendar Year].&[2012], [Measures].[Internet Sales Amount]), SOLVE_ORDER = 4, CAPTION = 'Internet Sales Amount 2012'
  MEMBER [Measures].[XQE_V5M_Reseller Order Quantity 2013_CM21] AS ([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Reseller Order Quantity]), SOLVE_ORDER = 4, CAPTION = 'Reseller Order Quantity 2013'
  MEMBER [Measures].[XQE_V5M_Reseller Sales Amount 2013_CM22] AS ([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Reseller Sales Amount]), SOLVE_ORDER = 4, CAPTION = 'Reseller Sales Amount 2013'
  MEMBER [Measures].[XQE_V5M_Internet Order Quantity 2013_CM23] AS ([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Internet Order Quantity]), SOLVE_ORDER = 4, CAPTION = 'Internet Order Quantity 2013'
  MEMBER [Measures].[XQE_V5M_Internet Sales Amount 2013_CM12] AS ([Date].[Calendar].[Calendar Year].&[2013], [Measures].[Internet Sales Amount]), SOLVE_ORDER = 4, CAPTION = 'Internet Sales Amount 2013'
SELECT 
  {CROSSJOIN({[Date].[Calendar].[Calendar Year].&[2013]}, {[Measures].[XQE_V5M_Internet Sales Amount 2013_CM12], [Measures].[XQE_V5M_Internet Order Quantity 2013_CM23], [Measures].[XQE_V5M_Reseller Sales Amount 2013_CM22], [Measures].[XQE_V5M_Reseller Order Quantity 2013_CM21]}), CROSSJOIN({[Date].[Calendar].[Calendar Year].&[2012]}, {[Measures].[XQE_V5M_Internet Sales Amount 2012_CM20], [Measures].[XQE_V5M_Internet Order Quantity 2012_CM19], [Measures].[XQE_V5M_Reseller Sales Amount 2012_CM18], [Measures].[XQE_V5M_Reseller Order Quantity 2012_CM17]}), ([Date].[Calendar].DEFAULTMEMBER, [Measures].[XQE_V5M_Internet Sales Amount YoY_CM16]), ([Date].[Calendar].DEFAULTMEMBER, [Measures].[XQE_V5M_Internet Order Quantity YoY_CM15]), ([Date].[Calendar].DEFAULTMEMBER, [Measures].[XQE_V5M_Reseller Sales Amount YoY_CM14]), ([Date].[Calendar].DEFAULTMEMBER, [Measures].[XQE_V5M_Reseller Order Quantity YoY_CM13])} DIMENSION PROPERTIES PARENT_LEVEL,  PARENT_UNIQUE_NAME ON AXIS(0), 
  DESCENDANTS([Sales Territory].[Sales Territory].[All Sales Territories], 3, SELF_AND_BEFORE) DIMENSION PROPERTIES PARENT_LEVEL,  PARENT_UNIQUE_NAME ON AXIS(1)
FROM [Adventure Works]  CELL PROPERTIES CELL_ORDINAL,  FORMAT_STRING,  LANGUAGE,  VALUE

Looking at this in the profiler, I can see that it’s taking 20 ms to process, and generates 67 subcubes.

In this case it’s better to use the years as a set, and rely on the implicit grouping of a crosstab. This results in significantly fewer data items.

But what about the Year over Year% columns? Manually calculating each column would certainly be easy to do, but again, it’s not necessary. We can create a calculated member in the time hierarchy that calculates it for us.
member(([Current Year] – [Prior Year]) / [Prior Year] , ‘YoY’,’YoY’,[Cube].[Time Dim].[Time Hier])

It might look silly to people coming from a relational background. After all, (2017-2016)/2016 = 4.96%. In this case, the calculation is happening to the nested measures. We can then select the member fact cells of the calculated member, and format all of the cells as percentage.

Let’s take a look at the underlying MDX:

WITH
MEMBER [Date].[Calendar].[XQE_V5M_CM1] AS ((([Date].[Calendar].[Calendar Year].&[2013])-([Date].[Calendar].[Calendar Year].&[2012]))/([Date].[Calendar].[Calendar Year].&[2012])), SOLVE_ORDER = 4, CAPTION = ‘YoY %’
SELECT
{CROSSJOIN({[Date].[Calendar].[Calendar Year].&[2013], [Date].[Calendar].[Calendar Year].&[2012]}, {[Measures].[Internet Sales Amount], [Measures].[Internet Order Quantity], [Measures].[Reseller Sales Amount], [Measures].[Reseller Order Quantity]}), CROSSJOIN({[Date].[Calendar].[XQE_V5M_CM1]}, {[Measures].[Internet Sales Amount], [Measures].[Internet Order Quantity], [Measures].[Reseller Sales Amount], [Measures].[Reseller Order Quantity]})} DIMENSION PROPERTIES PARENT_LEVEL, PARENT_UNIQUE_NAME ON AXIS(0),
DESCENDANTS([Sales Territory].[Sales Territory].[All Sales Territories], 3, SELF_AND_BEFORE) DIMENSION PROPERTIES PARENT_LEVEL, PARENT_UNIQUE_NAME ON AXIS(1)
FROM [Adventure Works] CELL PROPERTIES CELL_ORDINAL, FORMAT_STRING, LANGUAGE, VALUE
[/sourecode]

In this case the runtime is 12 ms with 31 subcubes generated.

That’s 40% faster. While in this example the actual difference is insignificant, in real life the runtime difference for complex queries can be profound. The report that instigated this post was taking over an hour to process, but with the changes I mentioned it dropped down to 3 minutes.

The end result here is a faster, more compact query. Much easier to maintain, much easier to change, and much easier to hand off to clients so you never have to look at it again.

Report XML Below
OLAP Report Design (178 downloads)

Quickie: Top 5 or Top 4 with “Others”

Here’s a puzzle for you. A set, depending on filters, may have anywhere between 1 and 100 members. If there are more than 5 members, we should see 4 with an aggregated “Other” member. If there are 5 or fewer we should just see them.

Top 4

We could try to do it with an insane combination of head and topCount and pull end up pulling out far too much hair. Instead, Cognos has a couple built in function that makes this a snap.

_firstFromSet is a function that will return the first N members from a set, with an optional overflow. Let’s say that Product Line has 5 members: Camping Equipment, Golf Equipment, Mountaineering Equipment, Outdoor Protection, and Personal Accessories. If we were to do _firstFromSet([Product Line],4,1) it would return all 5 members. Order Method, on the other hand, has 7 members. _firstFromSet([Order Method],4,1) will return only the first 4.

So this gives us either 4 or a max of 5 members, but how do we get that “Other” member? There is another, similar, function called _remainderSet. If the set in question contains more than a specified number of members, it will return a specified member. If the referenced set contains fewer, it will return an empty set. So let’s see an example:

_remainderSet (
  member(
      currentMeasure - total(currentMeasure within set _firstFromSet([Dim],4,1))
    , 'Other'
    , 'Other'
    , hierarchy([Dim])
  )
  , [Dim]
  , 5
)

So in this case, if [Dim] contains more than 5 members, it will create a calculated member that finds the total for the remaining members.

Putting it together we get:

  union(
  _firstFromSet (
     [Dim]
    , 4
    , 1
  )
  ,  _remainderSet (
    member(
        currentMeasure - total(currentMeasure within set _firstFromSet([Dim],4,1))
      , 'Other'
      , 'Other'
      , hierarchy([Dim])
    )
    , [Dim]
    , 5
  )
)

The report is, as usual, attached below.
Top 5 (563 downloads)

Guest Post: On Cognos Reports Performance Improvement

One of the tasks I’m hired to do quite often is to analyse why a report (or a group of reports) is loading slowly, and how could this be improved. The good news are, when a report runs for a very long time, there is almost always something we can do to improve it. But first, we must analyse what causes the report to run long.

The first thing we want to do is to rule out the query as the culprit. That is, we want to see how long it takes to get the report data from the database. This is best done by getting someone who is well versed in SQL or MDX to write a query that will pull the relevant data from the database. Many BI developers have sufficient knowledge of SQL, but in case you don’t, get a DBA to help.

One might ask – why not just take the SQL Cognos generates, and run that against the database. The answer is that we are trying to figure out what Cognos does wrong, and that includes the query generation. We need to compare what Cognos does with a well structured query.

If your manual query takes a long time to give results, it means that the underlying database has a hard time supporting your query. There could be many reasons for that. First of all, check that your tables are properly indexed and, if necessary, partitioned. Again, developers with gaps in knowledge in terms of DB tuning will do well to consult a DBA. If you are aggregating large amount of rows, or creating many joins, maybe an aggregated table or a materialised (Indexed) view will solve the problem. If you are doing multiple aggregates based on a few million rows, perhaps a Dynamic Cube would be a good bet, or you could try building an OLAP cube to support your reporting needs (Almost any cubing solution would be better than Transformer, but if push comes to shove, Transformer could be better than nothing).

If your data source is OLAP, and a query takes long to come back, it probably means you might want to design your cube differently. The most important tip I can give you here is to avoid nesting in as much as possible – try to hierarchise the nested columns under one single hierarchy, it will also help with aggregates. A rugged hierarchy is often preferable to a full cross join between different hierarchies, even when nonempty is defined in the query.

If your manual query returns within a reasonable time, it’s time to compare it with the Cognos generated query (Obtain it from tools->Show generated SQL/MDX). Just skim through the Cognos query and see that there aren’t any major differences. If there are, it could be your framework model is problematic.  Run the Cognos generated query against the DB – how long does it take to come back? If much longer, then your model is probably at fault. The most usual culprit in cases like these are ill defined relations.

If the query Cognos generates is fine, we’re probably looking at some processing done on the result set. There are several of those, the common ones are master-detail relationships, chart generation and crosstab generation. There are ways to improve master-detail relationships over CQM (Follow this link, and also make sure that you are sectioning by a single key field), but ultimately, a DQM model would optimise master-detail relationships better than any wizardry. Crosstabs and charts over relational are rendered by creating mini-cubes on the server side, these may take some processing – again, the best cure is DQM, or creating a dimensional data source. If you are rendering many charts, the charting engine will queue some of them – not a lot to be done here other than increasing number of connections to the Graphics Service, minimising the amount of charts by using matrix rows/columns where appropriate or by loading the serially, or creating a RAVE based chart that spans multiple charts together.

These are the basics. There are obviously many cases which aren’t covered here, but that should give you the first-aid steps. If you tried all this and your report still slugs, then it is time to call the cavalry in.

 

Nimrod (Rod) Avissar is a Cognos expert, with a penchant for specialized UX solutions. Feel free to drop me a line! (LinkedIn).