Why cardinality is so important in a Framework Model

I recently had a report that a dashboard was returning incorrect results. At first glance this isn’t exactly unexpected, and I had initially attributed it to user error. But then I tested it, first in Dashboards then in Report Studio, and I was getting the same results.

The request is simple. Show a list of product lines, revenue, and revenue per working day. The revenue fact is joined to the product dim and the date dim. Working days fact is joined only to the date dim.

I know that 2011 had around 200 working days, but let’s see what I get when I recreate the query

That seems wrong, 87,000 working days in a single year is a little excessive, even compared to my own workaholism. What is going on in the query?

WITH 
DaysFact0 AS 
    (
    SELECT
        DaysFact.DAY_KEY AS DAY_KEY, 
        CASE DaysFact.DAY_OF_WEEK
            WHEN 1 THEN 0
            WHEN 7 THEN 0
            ELSE 1
        END AS "Working Days", 
        DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact
    )
SELECT
    SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
    SUM(DaysFact0."Working Days") AS Working_Days, 
    SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL
FROM
    great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
    INNER JOIN DaysFact0
      ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY
    INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact
      ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY
    INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM
      ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY 
WHERE 
    GO_TIME_DIM_Order.CURRENT_YEAR = 2011 
GROUP BY 
    SLS_PRODUCT_DIM.PRODUCT_LINE_CODE

With a multiple fact query, I should be getting a stitch query in the SQL. In effect, it should be creating two queries – one for the revenue measures and one for the days. It should then join them together on the conformed dimension. In this case, there’s no conformed dimension, so the same number should repeat for each product line.

The problem must be in the physical layer – let’s take a look at the joins.

Do you see the problem? The join from the Date Dim to the Date Fact is a 1..1 on both sides. This instructs Cognos that the date fact is actually a snow-flaked dimension. But, you may argue, it is a 1 to 1 join! The join itself is on the day key, how much more 1 to 1 can you get? The answer is simple – cardinality simply describes wether a table is a fact (at the end of a 1..n join) or a dimension.

Switching that join to 1..n results in the following SQL.

WITH 
DaysFact0 AS 
    (
    SELECT
        DaysFact.DAY_KEY AS DAY_KEY, 
        CASE DaysFact.DAY_OF_WEEK
            WHEN 1 THEN 0
            WHEN 7 THEN 0
            ELSE 1
        END AS "Working Days", 
        DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact
    ), 
FS1 AS 
    (
    SELECT
        SUM(DaysFact0."Working Days") AS Working_Days
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
            INNER JOIN DaysFact0
            ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY 
    WHERE 
        GO_TIME_DIM_Order.CURRENT_YEAR = 2011
    ), 
FS2 AS 
    (
    SELECT
        SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
        SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL
    FROM
        great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order
            INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact
            ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY
                INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM
                ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY 
    WHERE 
        GO_TIME_DIM_Order.CURRENT_YEAR = 2011 
    GROUP BY 
        SLS_PRODUCT_DIM.PRODUCT_LINE_CODE
    )
SELECT
    FS2.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, 
    FS1.Working_Days AS Working_Days, 
    FS2.SALE_TOTAL AS SALE_TOTAL
FROM
    FS1, 
    FS2

It’s a little longer, but it makes much more sense now. It’s a single value on the days fact side with no conformed dimension, so there’s no join on the two FS queries.

And the results:

In this example it was fairly obvious where the problem was, but in a real-life example you may have dozens of tables connecting to many facts. If you are running into data problems, a careful review of the joins may often reveal the problem.