I recently had a report that a dashboard was returning incorrect results. At first glance this isn’t exactly unexpected, and I had initially attributed it to user error. But then I tested it, first in Dashboards then in Report Studio, and I was getting the same results.
The request is simple. Show a list of product lines, revenue, and revenue per working day. The revenue fact is joined to the product dim and the date dim. Working days fact is joined only to the date dim.
I know that 2011 had around 200 working days, but let’s see what I get when I recreate the query
That seems wrong, 87,000 working days in a single year is a little excessive, even compared to my own workaholism. What is going on in the query?
WITH DaysFact0 AS ( SELECT DaysFact.DAY_KEY AS DAY_KEY, CASE DaysFact.DAY_OF_WEEK WHEN 1 THEN 0 WHEN 7 THEN 0 ELSE 1 END AS "Working Days", DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK FROM great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact ) SELECT SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, SUM(DaysFact0."Working Days") AS Working_Days, SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL FROM great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order INNER JOIN DaysFact0 ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY WHERE GO_TIME_DIM_Order.CURRENT_YEAR = 2011 GROUP BY SLS_PRODUCT_DIM.PRODUCT_LINE_CODE
With a multiple fact query, I should be getting a stitch query in the SQL. In effect, it should be creating two queries – one for the revenue measures and one for the days. It should then join them together on the conformed dimension. In this case, there’s no conformed dimension, so the same number should repeat for each product line.
The problem must be in the physical layer – let’s take a look at the joins.
Do you see the problem? The join from the Date Dim to the Date Fact is a 1..1 on both sides. This instructs Cognos that the date fact is actually a snow-flaked dimension. But, you may argue, it is a 1 to 1 join! The join itself is on the day key, how much more 1 to 1 can you get? The answer is simple – cardinality simply describes wether a table is a fact (at the end of a 1..n join) or a dimension.
Switching that join to 1..n results in the following SQL.
WITH DaysFact0 AS ( SELECT DaysFact.DAY_KEY AS DAY_KEY, CASE DaysFact.DAY_OF_WEEK WHEN 1 THEN 0 WHEN 7 THEN 0 ELSE 1 END AS "Working Days", DaysFact.DAY_OF_WEEK AS DAY_OF_WEEK FROM great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM DaysFact ), FS1 AS ( SELECT SUM(DaysFact0."Working Days") AS Working_Days FROM great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order INNER JOIN DaysFact0 ON GO_TIME_DIM_Order.DAY_KEY = DaysFact0.DAY_KEY WHERE GO_TIME_DIM_Order.CURRENT_YEAR = 2011 ), FS2 AS ( SELECT SLS_PRODUCT_DIM.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, SUM(SalesFact.SALE_TOTAL) AS SALE_TOTAL FROM great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.GO_TIME_DIM GO_TIME_DIM_Order INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_SALES_FACT SalesFact ON GO_TIME_DIM_Order.DAY_KEY = SalesFact.ORDER_DAY_KEY INNER JOIN great_outdoors_sales.CognosSampleDev_GOSALESDW.gosalesdw.SLS_PRODUCT_DIM SLS_PRODUCT_DIM ON SLS_PRODUCT_DIM.PRODUCT_KEY = SalesFact.PRODUCT_KEY WHERE GO_TIME_DIM_Order.CURRENT_YEAR = 2011 GROUP BY SLS_PRODUCT_DIM.PRODUCT_LINE_CODE ) SELECT FS2.PRODUCT_LINE_CODE AS PRODUCT_LINE_CODE, FS1.Working_Days AS Working_Days, FS2.SALE_TOTAL AS SALE_TOTAL FROM FS1, FS2
It’s a little longer, but it makes much more sense now. It’s a single value on the days fact side with no conformed dimension, so there’s no join on the two FS queries.
And the results:
In this example it was fairly obvious where the problem was, but in a real-life example you may have dozens of tables connecting to many facts. If you are running into data problems, a careful review of the joins may often reveal the problem.
I think it is still a bit of a convaluted way (by Cognos) to solve this by Common table expressions in sql.
Resolving the days_fact actually sprouts 2 CTE’s, where the second one references the first one.
The salesfact is covered with 1 CTE, which could obviously also be used (I mean just one CTE) for the first fact.
Back in my Cognos days I spend a lot of time on similar issues , including the header-line scenario where the header table (the 1 end of the 1 n) relation stored a fact. Surprisingly , Cognos managed to come up with the proper solution based on the proper cardinality setting. Indeed this was the one and only thing to make sure of to be properly set. (Not the determinants)
In this case the samples didn’t have any bit day flags. My client we’re doing sum(datedim.isWorkDayFlag). In the samples I’m building a case statement on the query in the data layer, which is causing the overly convoluted SQL.
I’ve actually been getting a lot of flack from certain quarters over handling date measures like this, so I’ll post a follow up at some point with a different (maybe better) way counting work days.