Some time ago Paul and I were working together as Cognos Infrastructure specialists for a certain company. During that time, one day the Cognos development environment became unstable, and we were struggling to get it up and running again.
After the day’s events I wrote an email explaining these events to the company’s developers. At Paul’s request I am publishing the main points from this email here. I think they can be educating for a particular sort of Cognos instability, often seen in dev environments.
Mostly, I’m talking about crfashes which are a result of Cognos’ Content Manager reaching maximum memory, and then shutting down. Without the CM there can be no authentication, without authentication there’s no navigation. Also, when a server reaches maximum available memory, it will become slow and non-responsive.
There are several reasons why a CM Server would max out on memory. One of them is a bug in Cognos 10.1.1 to be fixed in the nearest fix pack (This is really a Java bug, and the JRE was updated in FP1). This bug causes a “Java.Lang.OutOfMemory” error on login. It’s a slow, steady memory leak that breaks out every now and again.
Another well documented reason for a CM server to crash and burn for lack of memory is when requests are piled up in the memory faster than the memory cleans itself. Imagine the following scenario: A report executes the same query 50 times. The query has an error, so the error is written to the log and also to the audit database. The CM is in charge of Audit, so the CM gets 50 requests for Audit update in less than a minute from this report. The CM keeps all these requests in memory, where they are kept not until they are performed, but until the memory is cleaned. This behavior is a result of the requests being stored the Cognos Java process, which is only cleaned once every n minutes.
Now, the developer running the report gets an error or false data, makes a slight change and runs it again. And so on.
If the CM’s memory get filled before a cleanup is made, the CM will max out and will pretty much play dead.
It is very difficult to say, just by analyzing Cognos’ logs, which report or group of reports are responsible for this. If you work with a few developers, it’s not tough to cooperate with them to find the culprits. But the case may be that you’re working with dozens of developers, some not even on site. However, not all hope is lost. You can scan the logs looking for reports that were run over and over and resulted in errors to the logs. To verify, try hiding suspected reports via security, and see what happens. If this works, you’ll have to have a talking-to with the report’s developer. Luckily, since you hid their reports, they are likely to contact you, so you don’t have to search for them.
And as for developers, as a general rule, if an environment is unstable and you are running reports that have been recently developed or that get stuck whenever you run them or that return error messages, the two things may be connected. Your report might, unintentionally, be causing the instability. Alert whoever it is you need to alert.
The author, Nimrod Avissar, is a Cognos Architect and Cognos Team Leader at Data-Mine Israel.
thanks for the tip Nimrod – do you believe that this bug may have been just identified, or do you think it goes back to over versions of CRN?