Secondary Cell Suppression on OLAP

I recently had the great pleasure of working with the Department of Health and Human Services in New Hampshire. They have employed a company called Abacus Service Corporation to develop a wonderful dashboarding system in order to assist them in their goals of greater transparency (which I hope to review on here eventually, it’s a combination of Metric Studio and Workspace that is significantly easier for end users). Their team is extremely skilled, and they only needed a few days of my time to get them past a few difficult problems. They’ve agreed to let me publish the solution two of those problems.

One of their goals is to publish reports, directly to the public, on various health related issues. While transparency is to be commended, smaller data sets present possible privacy law violations. If data was sparse enough, people could theoretically back calculate in order to determine who specific people are from the data. In order to prevent people from being able to back calculate, the need to suppress additional data. They were able to accomplish their goal, but at a cost of a runtime of around 15 minutes. For a simple dataset, and one is supposed to be consumable by end users on the fly, any run time of more than a few seconds is unacceptable.

Let’s reproduce a sample using the Sales and Marketing cube. In this example we’ll say that any value below 20,000 and above 0 is sensitive and must be suppressed.


In that screenshot we can see the quantity sold for each product line for each quarter. Here we can see that not a single value is below 20,000. However, what happens if we slice by a specific country?

italy unsuppressed

Here we see the same data set sliced by Italy. We can instantly see many instances of sensitive data being displayed (remember, anything below 20,000 and above 0 is sensitive). For ease of understanding, I’m going to call rename Quantity as “Check Measure”.
Suppressing those values is a simple matter of:

if([Check Measure] not between 1 and 20000) then ([Check Measure]) else (null)

Save that expression in a new data item called Pass 1.

Pass 1

Now we can see sensitive cells are suppressed. Unfortunately it is trivial to recalculate those values where only one cell is suppressed in a column, especially considering that we’re showing a totals row. Each column and row must have either >1 or 0 suppressed cells. To make it even easier to understand, I’ll rename Years to Columns and Product Line to Rows.
First thing to do, create a two new items that finds the next minimum value in the columns. We will want to suppress those values in order to secure the suppressed value. The expression is written to use the memberUniqueName in the rows in order to avoid issues with the next value being a tie.

Columns Min
This finds the minimum value of Check Measure after the values have been filtered out. 
If there are any values suppressed in the first pass, the second pass will filter by values greater than what this returns, effectively suppressing a minimum of 1 more cell than are suppressed.
  , item(
      , 2
      , [Check Measure] 
    , 1

Next, the second pass checks if the current row has one suppressed cell, and if so, it will suppress the cell where the column matches the result from Columns Min

2nd Pass
This will count the number of cells in the columns, and deduct the number of cells remaining. If the value is greater than 0, then a cell is being suppressed - suppress any cells which are equal to the minimum value remaining.

member(if([Pass 1] is null) then ([null]) else (
  total([One] within set [Columns]) - total([Pass 1] within set [Columns])=1) 
then (
  if(roleValue('_memberUniqueName',[Columns]) =[Columns Min]) then ([null]) else ([One])
else ([One])

Pass 2

Now the same basic idea for rows.

Rows Min:

This finds the minimum value of Check Measure after the values have been filtered out. 
If there are any values suppressed in the first pass, the third pass will filter by values greater than what this returns, effectively suppressing a minimum of 1 more cell than are suppressed.

  , item(
      , 2
      , [Check Measure] 
    , 1

and Pass 3:

3rd Pass
This will count the number of cells in the rows, and deduct the number of cells remaining. If the value is greater than 0, then a cell is being suppressed - suppress any cells which are equal to the minimum value remaining.

if([Pass 2]  is null) then ([null]) else (
  total([One] within set [Rows]) - total([Pass 1] within set [Rows])=1) 

then (
if(roleValue('_memberUniqueName',currentMember(hierarchy([Rows]))) =[Rows Min]) then ([null]) else ([One])

else ([One]))

Again, first it finds all columns that have 1 suppressed cell in the rows then it suppressed the next lowest value:

Pass 3

In this case, there are no more instances of unsecured sensitive cells. However, what happens if we had a data set containing only one sensitive cell? It would suppress another cell in the same row and column, but those two cells would now be at risk. If we slice by the United States the third pass returns the following set:

Pass 3 unsecure

One more pass is needed to hide the remaining intersection:

if([Pass 3] is null) then (null) else (
if ([cc]  = 1) then(
if(roleValue('_memberUniqueName',currentMember(hierarchy([Rows]))) =
order(filter([Rows], ([cr] >0)and ( [Pass 2]  is not null) )  , [Check Measure] ,basc
) ) then (null) else ( [Final Measure])
else  ( [Final Measure])

cc is a very simple: total([One] within set [Rows]) – total([Pass 2] within set [Rows])
with cr being: total([One] within set [Columns]) – total([Pass 3] within set [Columns])

This is similar to pass 2. It will count the number of suppressed cells in the column. The big change is instead of looking for and suppressing the smallest value the column, we are now looking for a row that already has at least one suppressed cell. It will find the matching rows, order them ascending by number of suppressed cells, and take the last one in the list. Instead of returning a 1 or null, this will return the final measure, in this case revenue.

And the final results, for the United States:
Pass 4

The end result will always be a minimum of 2 rows hidden on each row or column, or none at all. This satisfies the demand for data security on a row level without having to run extremely complex calculations for each possible combination of data in the ETL.

At this point the run time is about 2 seconds (on my laptop), still a long time considering the original query, but far more manageable. It should be mentioned that this is useful for smaller data sets, additional columns and rows will increase the run time significantly. Nested nodes will cause additional complexity, and should probably be avoided.

Sample report can be found here.

Information Security, Risk Management and Back End – Guest Post

Paul is my good friend, a Cognos guru and a colleague for a while now, and I’m reading his blog regularly. When I read his latest post, regarding information security in Cognos reports, I felt that, untypically, some points in the post need refinement. So I asked Paul if he would kindly agree to host my claims, and he generously accepted. So, without further ado, here are my thoughts on Paul’s post.

First of all, it should be noted Paul is absolutely correct on two points: Data security should never ever be applied in report level, and data security should always be applied in the back end. However, I’m not sure that the possible leak via token prompts constitutes a viable security threat that requires extreme measures, and I am quite certain that Cognos’ Meta Data modeling tool, Framework Manager, is not really the back end of the system. Here’s why.

Information Security as Risk Management

We hold these truths to be self-evident, that all systems are vulnerable security-wise. A server is seldom hardened enough, communication is seldom encrypted enough, and passwords are rarely well kept enough to prevent a malicious hacker with intranet access from stealing data, changing data or plainly viewing data they’re not supposed to. Imagine any IT system you know, and imagine a user at least as savvy as you are, and entirely malicious and very keen to do one thing only: To view data they’re not supposed to. If that is the mental picture you  have when securing your system, you will need to double your investment in information security just to start tackling it.

But we cannot ignore two facts: One, users are seldom malicious and not very often are they savvy; and two, not all data is worth the investment of protecting. Let’s start with the first point: Users aren’t very often savvy. Users – analysts, professionals, economists and so on – are usually professional people who were hired to perform tasks they’ve specialized in. Most of them – sans IT people – did not spend their adult years learning computer systems, data structures, communication protocols and password policies. Some of them may be proficient, but very little of them will be hackers, or at the very least good hackers. Those who are, if they exist, will also know they would be the first suspects in any leak. Which brings me to the second part of this point: Users and systems do not exist in vacuum. They exist within an organization. The organization deals with many of the threats internally: There’s the risk of getting shamefully, never-to-be-hired-by-anyone fired, the risk of being sued – in short, the risk of getting caught. A system that is openly monitored can, just by the act of monitoring, seriously deter people from trying to sneak a pick at their colleagues’ salaries in the HR system. On top of that, most people come to work in order to do an honest job and get an honest pay. The majority of the people won’t attempt to hack a system or to access restricted data not just because they don’t know how to, but also because they have no reason to, because the bottom line is that most people are not evil.

The second point was that while we obscure and hide data from unauthorized users, not all data should be protected the same. A company’s customers’ credit card numbers should be protected for dear life. Their marital status, not as much. The extreme example is if we’re going to spend 10 working hours patching up a potential leak , the damage of which will be cheaper than 10 working hours.

So, when determining how to allocate our information security resources, we consider the feasibility of the loophole being found and utilized, and the sensitivity of the data, against the different costs (In terms of user friendliness, performance, labour hours and so on) of patching up the potential threat. In other words, we assess the risk and decide whether it’s worth addressing, and to what level.

If to return to Paul’s original post, while he had found a very elegant security hole with token prompts, I think in most cases it would be the kind of breach we wouldn’t invest, normally, in blocking. Even after reading and understanding Paul’s post, most users will not know how to identify that a report has a token prompt or how to make use of it. And even if they did, most users are not fluent in the database structure and SQL possibilities. If they were, we’d be out of a job. This isn’t security by obscurity because we do not assume data is secure, only that hacking it is unfeasible. On the other hand, the solution Paul offered – to use a parameter map – is costly on several levels: First, it requires republishing a package for every new token prompt, which is cumbersome and may have an effect on other reports, especially if the model is constantly developed. Also, it prolongs development times. It should also be noted that large parameter maps force Cognos to laboured local processing, thus affecting performance. On the other hand, we are talking about users who are trusted to view a certain report based on a certain package, and who are not very likely to find that breach and make use of it. So, in my opinion, unless the kind of data that can be potentially exposed is extremely sensitive, to an extent that no threat, no matter how unfeasible, can be tolerated, it isn’t worth the investment.

Framework As Middle Tier

But suppose that the data I’m protecting is the kind of data that simply cannot be risked, at any cost. This could be the case, for example, if I’m required to keep certain data safe guarded by law, under heavy penalties, or if a leak will cause publicity damages, or in the case of industrial secrets and so on. I would still argue against Paul’s solution, because Paul was right to assert security is a matter for back end, and Framework Manager is not back end.

Cognos web portal and viewer are certainly the front end. They handle data visualization, navigation and object level security (Which user should be allowed to see which reports). As mentioned earlier, they should never handle data level security. The back end is the database itself, where data is stored for safekeeping. Framework Manager is a middle tier between the front end and the back end, handling the query logic. It could be thought of as a logic engine. Data-level security isn’t normally a part of the logic, as opposed to object level security (Which fields/query subjects will be available to who), because the logic is applied on the available data. Having the same tier that manipulates the given data also decide which data to manipulate is opening the door to a host of problems. Why? Because we’re making security a part of the logic rather than an infrastructural thing, and that means we’re tying security and logic together. Any changes to each might invalidate the other. Translate tables via a parameter map for security reasons, and you’re adding relationships which might affect existing ones. Change the relationship of a few query subjects, or add fields, and you may be opening a new security leak.

Which is why if your data is very sensitive, you need to secure it in the database level. There are several ways of doing that – you could use data source command blocks to pass the user’s data to the database on every session, for either logging or identification purposes. With SQL Server, you have built in Single Sign On abilities you can facilitate. With Oracle you can implement Single Sign On using a proxy user, and if further security is required, that proxy user’s credentials can be further secured by SEPS, and the usage of alter user grants can ensure that while logged in via proxy, only certain actions are allowed.

To conclude, I believe the token prompt loophole Paul found is in most cases not worth the efforts securing. When it is – because no risk is a small risk with certain types of data – security should be implemented on the database level, not on the Framework level. But this isn’t just about Paul’s example: This is the proper way to tackle any security gaps that come up, either by a survey or from experience. Evaluating the risk to begin with, and taking proper action in the proper level eventually.

The author, Nimrod Avissar, is a Cognos Architect and Cognos Team Leader at Data-Mine Israel. 

Information Security in Cognos Reports

Cognos has an extremely intricate security system. It is well suited for secure reports and confidential information. Some clients expose portions of their platform to the public, safe in the knowledge that users can only see what they’re allowed to see. However, all of this security can be for nothing through careless report design.

When building any secure reports, it is essential that security be handled in the back end. There have been countless reports built with JavaScript enforced security built on the prompt page, or with security handled with conditional blocks and parameters. Token prompts, while perfect for building dynamic reports, can create holes in which malicious users can attempt to retrieve data that would normally be hidden to them.

This document is not intended to be a guide to hacking Cognos reports. Indeed, many reports would not even require such drastic oversight. Proper usage of macros and filters will solve a majority of the potential problems. Nor does every report need such stringent security. Many times filters are simply used as a means of speeding up the report, or allowing users to narrow their focus.

In this post I will go over possible security flaws with the token prompt.

Consider the following request: A report which shows Employee, Manager, a prompt to choose product line, product type or name. This report will be available to all users to show sales ranking throughout the company.

In order to fulfill the prompt requirement, a token prompt is created with the following static values:

While the data item in the report would be a simple:

When running the report, the user might see the following output:

But a malicious user may attempt to see each employee’s full address.

Nothing in the report was changed, and yet the user can now see confidential information. How was the user able to accomplish this?

Simply by changing the value option in the prompt with IE Developer Tools (each of the major browsers has a similar tool), the user can choose to see any item in the model. “But,” I hear you say, “I can set object level security inside Framework to deny that user group, I’ll even set it on the data layer!” That will certainly work well to prevent users from selecting items from the model, but there is still a work around.

Entering {“EMP_EMPLOYEE_DIM”.”ADDRESS1″ || ‘, ‘ || nvl(“EMP_EMPLOYEE_DIM”.”ADDRESS1″,’ ‘) || ‘, ‘ || “EMP_EMPLOYEE_DIM”.”PROV_STATE” || ‘, ‘ || “EMP_EMPLOYEE_DIM”.”POSTAL_ZONE”} into the value will work just as well, and because it doesn’t reference the data item path, Cognos won’t throw an error. The curly brackets there tell Cognos to execute the code as is. Note that this SQL doesn’t include the country. Cognos won’t attempt to build a join based on custom SQL like this, so including “Branch_region_dimension”.”COUNTRY_EN” would only result in an error.

Since adding custom SQL, like that concatenated string, works, would subqueries also work? In this example any attempt to use a subquery in the data item list would fail, as (at least in Oracle) subqueries are not supported in the Group By. But look what might happen when we remove the two measures from the list:

By adding {(SELECT rtrim (xmlagg (xmlelement (e, owner||’.’||table_name || ‘,’)).extract (‘//text()’), ‘,’) table_names FROM all_tables)} we can now see every single table available. Similar queries can be run to see the fields in each table. At this point the user has full read access to the database, limited entirely by whatever security is applied to the data source user.

The exploits are only possible because the data item contained only the token prompt. I’m not saying that token prompts should never be used; many of my reports are based on the functionality. Instead, use them wisely.

One workaround that I have found is a combination of a new table and a lookup. Build a new table called “TokenExpressions” with the fields “Key”, “Expression”, “Name”, “Report”, and “ReportGroup”. Import that into your framework. Create a parameter map (tokenLookup) based on that table with the TokenExpressions.Report+’-‘+TokenExpressions.Key as Key and TokenExpressions.Expression as Value. In the report, set up your value prompt with the source query as TokenExpressions.Key as Use and TokenExpressions.Name as Display. And finally, in the report queries, use the expression #’ReportName-‘+sq($tokenLookup{prompt(‘Field’,’integer’))}#. In addition to securing the report against unexpected inputs, you also make it significantly easier to add or remove options in the report.

The lesson learned is that using prompts as a security measure is inherently risky. Malicious users with access to web development tools (IE has them by default) may override the existing values and see data that isn’t meant for their eyes.

Finally, it is worth mentioning that using the curly brackets for subqueries can have legitimate uses. I have not found any other way to handle correlated subqueries, for instance. Oracle analytic functions can usually only be accessed through the curly brackets.