Problem Management / Alert Floods

A global financial institution uses a variety of systems management tools in its control center to help identify and respond to problems across its mission-critical applications and infrastructure. The event management system, Netcool, was being flooded with events and handling those events was consuming the time of all of the technical staff in the control center.

There were many types of events. False alarms occurred during expected change windows. Recurring events had well known responses, but still required time consuming manual remediation. Still other events occurred on the mission-critical system and other complex applications that required an enormous amount of investigation before any remediation could begin.

On average, it took an engineer more than 15 minutes to close each event. This kept the entire monitoring team in constant fire-fighting mode. They were so overwhelmed that they often missed early detection of service-impacting events on business-critical systems. 

Oasis™ automatically processed many events with no manual intervention by interfacing directly with command center systems (i.e., Netcool, Remedy for trouble ticketing, and CA CMDB for change management), the mission-critical business system, and the supporting infrastructure.

Integration with CA CMDB allowed Oasis to automatically identify false alarms due to known periods of change in the environment and to close the associated events in Netcool with no manual intervention. Automation of recurring events enabled Oasis to automatically take corrective action and update any necessary information in Netcool – again with no manual intervention. Automated diagnostics for complex problems also allowed Oasis to update events with rich diagnostic information and to create the appropriate tickets in Remedy. Thus, the command center team could focus on resolution from the start, rather than spend significant time with diagnostic triage.

Results:

  • Decreased command center staff by 30%
  • Ensured sufficient focus on the events associated with the mission-critical system, thereby achieving a measurable improvement in service levels
©2007 Optinuity, Inc. All rights reserved. | Legal Notices | Privacy Policy