Why cardinality is so important in a Framework Model

I recently had a report that a dashboard was returning incorrect results. At first glance this isn’t exactly unexpected, and I had initially attributed it to user error. But then I tested it, first in Dashboards then in Report Studio, and I was getting the same results.

The request is simple. Show a list of product lines, revenue, and revenue per working day. The revenue fact is joined to the product dim and the date dim. Working days fact is joined only to the date dim.

I know that 2011 had around 200 working days, but let’s see what I get when I recreate the query

That seems wrong, 87,000 working days in a single year is a little excessive, even compared to my own workaholism. What is going on in the query?

WITH 
DaysFact0 AS 
    (
    SELECT
        DaysFact.DAY_KEY AS DAY_KEY, 
        CASE DaysFact.DAY_OF_WEEK
            WHEN 1 THEN 0
            WHEN 7 THEN 0
            ELSE 1
        END AS "Working Days", 
        DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact
    )
SELECT
    SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
    SUM(DaysFact0."Working Days") AS Working_Days, 
    SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL
FROM
    great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
    INNER JOIN DaysFact0
      ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY
    INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact
      ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY
    INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM
      ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY 
WHERE 
    GO_TIME_DIM_Order.CURRENT_YEAR = 2011 
GROUP BY 
    SLS_PRODUCT_DIM.PRODUCT_LINE_CODE

With a multiple fact query, I should be getting a stitch query in the SQL. In effect, it should be creating two queries – one for the revenue measures and one for the days. It should then join them together on the conformed dimension. In this case, there’s no conformed dimension, so the same number should repeat for each product line.

The problem must be in the physical layer – let’s take a look at the joins.

Do you see the problem? The join from the Date Dim to the Date Fact is a 1..1 on both sides. This instructs Cognos that the date fact is actually a snow-flaked dimension. But, you may argue, it is a 1 to 1 join! The join itself is on the day key, how much more 1 to 1 can you get? The answer is simple – cardinality simply describes wether a table is a fact (at the end of a 1..n join) or a dimension.

Switching that join to 1..n results in the following SQL.

WITH 
DaysFact0 AS 
    (
    SELECT
        DaysFact.DAY_KEY AS DAY_KEY, 
        CASE DaysFact.DAY_OF_WEEK
            WHEN 1 THEN 0
            WHEN 7 THEN 0
            ELSE 1
        END AS "Working Days", 
        DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact
    ), 
FS1 AS 
    (
    SELECT
        SUM(DaysFact0."Working Days") AS Working_Days
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
            INNER JOIN DaysFact0
            ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY 
    WHERE 
        GO_TIME_DIM_Order.CURRENT_YEAR = 2011
    ), 
FS2 AS 
    (
    SELECT
        SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
        SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
            INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact
            ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY
                INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM
                ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY 
    WHERE 
        GO_TIME_DIM_Order.CURRENT_YEAR = 2011 
    GROUP BY 
        SLS_PRODUCT_DIM.PRODUCT_LINE_CODE
    )
SELECT
    FS2.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
    FS1.Working_Days AS Working_Days, 
    FS2.SALE_TOTAL AS SALE_TOTAL
FROM
    FS1, 
    FS2

It’s a little longer, but it makes much more sense now. It’s a single value on the days fact side with no conformed dimension, so there’s no join on the two FS queries.

And the results:

In this example it was fairly obvious where the problem was, but in a real-life example you may have dozens of tables connecting to many facts. If you are running into data problems, a careful review of the joins may often reveal the problem.

Guest Post: On Cognos Reports Performance Improvement

One of the tasks I’m hired to do quite often is to analyse why a report (or a group of reports) is loading slowly, and how could this be improved. The good news are, when a report runs for a very long time, there is almost always something we can do to improve it. But first, we must analyse what causes the report to run long.

The first thing we want to do is to rule out the query as the culprit. That is, we want to see how long it takes to get the report data from the database. This is best done by getting someone who is well versed in SQL or MDX to write a query that will pull the relevant data from the database. Many BI developers have sufficient knowledge of SQL, but in case you don’t, get a DBA to help.

One might ask – why not just take the SQL Cognos generates, and run that against the database. The answer is that we are trying to figure out what Cognos does wrong, and that includes the query generation. We need to compare what Cognos does with a well structured query.

If your manual query takes a long time to give results, it means that the underlying database has a hard time supporting your query. There could be many reasons for that. First of all, check that your tables are properly indexed and, if necessary, partitioned. Again, developers with gaps in knowledge in terms of DB tuning will do well to consult a DBA. If you are aggregating large amount of rows, or creating many joins, maybe an aggregated table or a materialised (Indexed) view will solve the problem. If you are doing multiple aggregates based on a few million rows, perhaps a Dynamic Cube would be a good bet, or you could try building an OLAP cube to support your reporting needs (Almost any cubing solution would be better than Transformer, but if push comes to shove, Transformer could be better than nothing).

If your data source is OLAP, and a query takes long to come back, it probably means you might want to design your cube differently. The most important tip I can give you here is to avoid nesting in as much as possible – try to hierarchise the nested columns under one single hierarchy, it will also help with aggregates. A rugged hierarchy is often preferable to a full cross join between different hierarchies, even when nonempty is defined in the query.

If your manual query returns within a reasonable time, it’s time to compare it with the Cognos generated query (Obtain it from tools->Show generated SQL/MDX). Just skim through the Cognos query and see that there aren’t any major differences. If there are, it could be your framework model is problematic.  Run the Cognos generated query against the DB – how long does it take to come back? If much longer, then your model is probably at fault. The most usual culprit in cases like these are ill defined relations.

If the query Cognos generates is fine, we’re probably looking at some processing done on the result set. There are several of those, the common ones are master-detail relationships, chart generation and crosstab generation. There are ways to improve master-detail relationships over CQM (Follow this link, and also make sure that you are sectioning by a single key field), but ultimately, a DQM model would optimise master-detail relationships better than any wizardry. Crosstabs and charts over relational are rendered by creating mini-cubes on the server side, these may take some processing – again, the best cure is DQM, or creating a dimensional data source. If you are rendering many charts, the charting engine will queue some of them – not a lot to be done here other than increasing number of connections to the Graphics Service, minimising the amount of charts by using matrix rows/columns where appropriate or by loading the serially, or creating a RAVE based chart that spans multiple charts together.

These are the basics. There are obviously many cases which aren’t covered here, but that should give you the first-aid steps. If you tried all this and your report still slugs, then it is time to call the cavalry in.

 

Nimrod (Rod) Avissar is a Cognos expert, with a penchant for specialized UX solutions. Feel free to drop me a line! (LinkedIn).

 

Information Security, Risk Management and Back End – Guest Post

Paul is my good friend, a Cognos guru and a colleague for a while now, and I’m reading his blog regularly. When I read his latest post, regarding information security in Cognos reports, I felt that, untypically, some points in the post need refinement. So I asked Paul if he would kindly agree to host my claims, and he generously accepted. So, without further ado, here are my thoughts on Paul’s post.

First of all, it should be noted Paul is absolutely correct on two points: Data security should never ever be applied in report level, and data security should always be applied in the back end. However, I’m not sure that the possible leak via token prompts constitutes a viable security threat that requires extreme measures, and I am quite certain that Cognos’ Meta Data modeling tool, Framework Manager, is not really the back end of the system. Here’s why.

Information Security as Risk Management

We hold these truths to be self-evident, that all systems are vulnerable security-wise. A server is seldom hardened enough, communication is seldom encrypted enough, and passwords are rarely well kept enough to prevent a malicious hacker with intranet access from stealing data, changing data or plainly viewing data they’re not supposed to. Imagine any IT system you know, and imagine a user at least as savvy as you are, and entirely malicious and very keen to do one thing only: To view data they’re not supposed to. If that is the mental picture you  have when securing your system, you will need to double your investment in information security just to start tackling it.

But we cannot ignore two facts: One, users are seldom malicious and not very often are they savvy; and two, not all data is worth the investment of protecting. Let’s start with the first point: Users aren’t very often savvy. Users – analysts, professionals, economists and so on – are usually professional people who were hired to perform tasks they’ve specialized in. Most of them – sans IT people – did not spend their adult years learning computer systems, data structures, communication protocols and password policies. Some of them may be proficient, but very little of them will be hackers, or at the very least good hackers. Those who are, if they exist, will also know they would be the first suspects in any leak. Which brings me to the second part of this point: Users and systems do not exist in vacuum. They exist within an organization. The organization deals with many of the threats internally: There’s the risk of getting shamefully, never-to-be-hired-by-anyone fired, the risk of being sued – in short, the risk of getting caught. A system that is openly monitored can, just by the act of monitoring, seriously deter people from trying to sneak a pick at their colleagues’ salaries in the HR system. On top of that, most people come to work in order to do an honest job and get an honest pay. The majority of the people won’t attempt to hack a system or to access restricted data not just because they don’t know how to, but also because they have no reason to, because the bottom line is that most people are not evil.

The second point was that while we obscure and hide data from unauthorized users, not all data should be protected the same. A company’s customers’ credit card numbers should be protected for dear life. Their marital status, not as much. The extreme example is if we’re going to spend 10 working hours patching up a potential leak , the damage of which will be cheaper than 10 working hours.

So, when determining how to allocate our information security resources, we consider the feasibility of the loophole being found and utilized, and the sensitivity of the data, against the different costs (In terms of user friendliness, performance, labour hours and so on) of patching up the potential threat. In other words, we assess the risk and decide whether it’s worth addressing, and to what level.

If to return to Paul’s original post, while he had found a very elegant security hole with token prompts, I think in most cases it would be the kind of breach we wouldn’t invest, normally, in blocking. Even after reading and understanding Paul’s post, most users will not know how to identify that a report has a token prompt or how to make use of it. And even if they did, most users are not fluent in the database structure and SQL possibilities. If they were, we’d be out of a job. This isn’t security by obscurity because we do not assume data is secure, only that hacking it is unfeasible. On the other hand, the solution Paul offered – to use a parameter map – is costly on several levels: First, it requires republishing a package for every new token prompt, which is cumbersome and may have an effect on other reports, especially if the model is constantly developed. Also, it prolongs development times. It should also be noted that large parameter maps force Cognos to laboured local processing, thus affecting performance. On the other hand, we are talking about users who are trusted to view a certain report based on a certain package, and who are not very likely to find that breach and make use of it. So, in my opinion, unless the kind of data that can be potentially exposed is extremely sensitive, to an extent that no threat, no matter how unfeasible, can be tolerated, it isn’t worth the investment.

Framework As Middle Tier

But suppose that the data I’m protecting is the kind of data that simply cannot be risked, at any cost. This could be the case, for example, if I’m required to keep certain data safe guarded by law, under heavy penalties, or if a leak will cause publicity damages, or in the case of industrial secrets and so on. I would still argue against Paul’s solution, because Paul was right to assert security is a matter for back end, and Framework Manager is not back end.

Cognos web portal and viewer are certainly the front end. They handle data visualization, navigation and object level security (Which user should be allowed to see which reports). As mentioned earlier, they should never handle data level security. The back end is the database itself, where data is stored for safekeeping. Framework Manager is a middle tier between the front end and the back end, handling the query logic. It could be thought of as a logic engine. Data-level security isn’t normally a part of the logic, as opposed to object level security (Which fields/query subjects will be available to who), because the logic is applied on the available data. Having the same tier that manipulates the given data also decide which data to manipulate is opening the door to a host of problems. Why? Because we’re making security a part of the logic rather than an infrastructural thing, and that means we’re tying security and logic together. Any changes to each might invalidate the other. Translate tables via a parameter map for security reasons, and you’re adding relationships which might affect existing ones. Change the relationship of a few query subjects, or add fields, and you may be opening a new security leak.

Which is why if your data is very sensitive, you need to secure it in the database level. There are several ways of doing that – you could use data source command blocks to pass the user’s data to the database on every session, for either logging or identification purposes. With SQL Server, you have built in Single Sign On abilities you can facilitate. With Oracle you can implement Single Sign On using a proxy user, and if further security is required, that proxy user’s credentials can be further secured by SEPS, and the usage of alter user grants can ensure that while logged in via proxy, only certain actions are allowed.

To conclude, I believe the token prompt loophole Paul found is in most cases not worth the efforts securing. When it is – because no risk is a small risk with certain types of data – security should be implemented on the database level, not on the Framework level. But this isn’t just about Paul’s example: This is the proper way to tackle any security gaps that come up, either by a survey or from experience. Evaluating the risk to begin with, and taking proper action in the proper level eventually.

The author, Nimrod Avissar, is a Cognos Architect and Cognos Team Leader at Data-Mine Israel.