AlarmManagement July2012 CEP
AlarmManagement July2012 CEP
Implement an Effective
Alarm Management
Program
Todd Stauffer, P.E. Apply the ISA-18.2 Standard on Alarm Management
exida
to design, implement, and maintain
an effective alarm system.
A
larms are used in chemical processing plants to thought goes into deciding which points should be alarmed
draw the operator’s attention to an abnormal condi- and why. This has led to an epidemic of alarm management
tion that, if disregarded, could lead to poor product issues including:
quality, unplanned downtime, damaged assets, personnel • nuisance alarms (chattering alarms and standing/stale
injury, or a catastrophic accident. When employed appro- alarms)
priately, alarms help the operator to safely run the process • alarms identified with the incorrect priority level
within normal operating conditions. They are one of the • alarms that require no operator response
first layers of protection to prevent the escalation of a haz- • alarms that occur frequently (“bad actors”)
ard into an accident (Figure 1). • alarm overload during normal conditions
Alarm management has become increasingly important • alarm floods during process upsets
as chemical plants look for ways to reduce costs, increase • improper alarm suppression.
productivity, and deal with the
loss of experienced operators.
It has also become more chal-
lenging due to the adoption of
the modern distributed control Community
system (DCS). Alarm systems Emergency Response
FIC
of the past consisted of panel- Plant
Mitigation
PRV
board control rooms, where the Emergency Response
number of alarms was limited Loss of Passive Protection
by the finite wall space, and Containment (e.g.,Bund/Dike)
there was an actual cost to Incident Active Protection LSSH
Chemical = A
(e.g.,Relief Valve, Rupture Disk)
hard-wire the system into the Material = B
LAH
process (approximately $1,000 Trip Safety Instrumented System
LSH Pressure = X
Temperature = Y
per alarm) (1). Today, alarms Volume = Z
are considered free because
Prevention
The philosophy stage also includes preparation of the is not concerned with what could happen if all protection
alarm system requirements specification (ASRS), which layers fail — the ultimate consequence — as defined in
identifies the alarm system’s functional requirements. The a HAZOP. If inaction does not generate significant con-
ASRS can be used to support vendor selection, serve as the sequences, for example if the only consequence is the
basis for system testing, and help in determining whether generation of another alarm, the alarm may not be needed.
any advanced/enhanced alarming techniques, such as cus- Operator response. Another important step in identify-
tomization or third-party products, are needed. ing and eliminating unnecessary alarms is documenting
the steps to be taken by the operator to correct the abnor-
Stage 2: Identification mal situation, such as closing a valve or starting a backup
Potential alarms are identified by reviewing plant and pump. If an operator response cannot be defined, then
process documentation. This documentation includes pro- the alarm is not valid and can be removed from consider-
cess (or piping) and instrumentation diagrams (P&IDs), pro- ation. If multiple alarm conditions share the same opera-
cess hazard analyses (PHAs), operating procedures, product tor action, this may indicate redundant alarms, and one or
quality reviews, layer-of-protection analyses, safe operating more can be eliminated.
limits, failure modes and effects analyses, environmental Response time. After determining how the operator
permits, and the existing control system configuration. should respond, the time available to take this action is
Candidate alarms should not be considered valid until estimated. Operator response time is defined as the time
they have successfully gone through the rationalization pro- between the activation of the alarm and the last moment
cess (discussed next). Even alarms that have been identified the operator can act to prevent the consequence; thus, it
as safeguards in a HAZOP analysis must be rationalized. represents the time available to the operator to fix the prob-
The criteria for determining whether an alarm is valid lem. If the available time is insufficient, the alarm should
should be applied when the alarm is first identified (e.g., be redesigned (because it will not be reliable) and replaced
during a hazard analysis) and its basis (e.g., purpose, with an automated response (i.e., an interlock).
cause, potential consequence, and time to respond) should Alarm priority. Alarm priority is established based on
be documented. These forward-thinking activities will operator response time and severity of the consequences,
improve the quality and amount of information available which are assessed against predefined thresholds in areas
for this evaluation. such as safety, environmental impact, and cost. ISA-18.2
recommends a maximum of three or four different priori-
Stage 3: Rationalization ties. To help operators respond effectively to the most
The modern DCS makes it easy to add alarms without critical alarms, only a small fraction should be set to high
significant effort, cost, or justification. To avoid unneces- priority (e.g., 5%), with the remainder set to medium
sary alarms, alarm rationalization aims to identify the (15%) or low (80%) priority.
minimum set of alarms needed to keep the process safe and Alarm class. Alarm class is assigned based on the type
within its normal operating range, and to ensure that every of consequences and the method used to identify the hazard
alarm is valid and necessary. This is a multistep process that and consequences (e.g., a HAZOP analysis). Alarms can be
includes defining and documenting the design attributes assigned to more than one classification.
(e.g., priority, setpoint, type, and classification), as well as Setpoints. Alarm setpoints (limits) should be defined far
the cause, consequence, time to respond, and recommended enough away from the consequence threshold to give the
operator response in a master alarm database (MADB). It is operator adequate time to respond, yet not so close to nor-
a team activity (similar to a HAZOP study) involving pro- mal operating conditions that nuisance alarms are triggered
duction and process engineers, process control engineers, as a result of normal process variation. A common mistake
experienced operators, and other personnel as needed. is to configure setpoints based on rules of thumb relative to
Alarm validity. The first step in the rationalization the range of a process variable. An example is configuring
process is to verify the validity of the alarm based on the setpoints for high-high, high, low, and low-low as 90%,
the criteria set forth in the philosophy document. If the 80%, 20%, and 10% of range, respectively.
candidate alarm does not meet the criteria — e.g., it does Advanced alarm handling. Lastly, one should evalu-
not represent an abnormal situation, it is not unique, it ate the need for advanced alarm handling by documenting
does not require a timely operator response, etc. — it can states, conditions, steps, phases, or products for which the
be removed from consideration. alarm limit or priority should be different from steady state,
Consequences. Next, the consequences of inaction — or the alarm should be suppressed from the operator. This
that is, the direct and immediate consequences of failing to helps to ensure that an alarm is always relevant when it is
manage each individual alarm — are identified. This step presented to the operator.
pressor trip) that would otherwise lead to an alarm flood; independent protection layer, or as part of an OSHA PSM
this is controlled by the logic that determines the relevance mechanical integrity program. These alarms do not occur
of the alarm often — typically only in periods of high operator stress
• shelving (manual) suppression — a mechanism, such as during a major plant upset.
typically initiated by the operator, to temporarily suppress
an alarm Stage 6: Operation
• out of service — the state of an alarm during which During the operation stage, alarms perform their func-
the alarm indication is suppressed, typically manually, for tion of notifying the operator of an abnormal situation. A
reasons such as maintenance. useful system provides tools, such as shelving and alarm-
ISA-18.2 defines other types of advanced and enhanced response procedures, to help the operator handle alarms.
alarming methods, including time-varying alarm attributes, Shelving is critical to responding effectively during a
redirection of alarms (e.g., via pagers) to personnel outside plant upset, as it allows the operator to manually hide less-
the control room, and techniques for automatically deter- important alarms on a temporary basis. In some systems,
mining the cause of abnormal situations. shelved alarms reappear automatically after a preset time
Advanced alarming could be applied, for example, period so that they are not forgotten.
to a reactor and its associated temperature, pressure, level, The alarm philosophy should specify which alarms can
and flow alarms. When the reactor is in operation, alarm be shelved and by whom, as well as which alarms cannot
limits could be set differently depending on the product be shelved (e.g., those that are of the highest priority or
that is being made or the step of the batch recipe that is related to personnel safety). Systems that support shelving
underway. When the reactor is idle or offline for main- require that the operator be able to view a list of all shelved
tenance, most of the alarms will not be useful and some alarms for review anytime, such as during shift change.
might be triggered unnecessarily. Alarm suppression can A key best practice is providing operators with alarm-
hide these unnecessary alarms, which would otherwise response procedures. Alarm-response procedures contain
remain active until the equipment is put back into service, process knowledge that was captured during rationaliza-
thus becoming stale alarms. tion (e.g., cause, consequence, corrective action, and time
Before suppressing an alarm, it is important to con- to respond), typically based on input from senior opera-
sider whether it is needed to detect a hazardous condition tors. This information, provided in context to the operator
even when the process or equipment is out of service. The from within the HMI (Figure 4), can be indispensable for
alarms for reactor high pressure and flow might be required helping operators (especially junior operators) respond to
to detect a leak (which would indicate a loss of isolation alarms more quickly and consistently.
from the process). Thus, these alarms should not be sup-
pressed and their limits should be set to detect the abnor-
mal condition.
Stage 5: Implementation
The alarms are put into service in the implementa-
tion stage. This stage includes commissioning, training,
and testing, all of which are ongoing activities that result
from process design changes or the addition of new
instrumentation.
For alarms to be effective, the operator must know how
to respond to each alarm. An effective training program
covers all realistic operational situations, including:
• system functionality and features such as sorting/fil-
tering, navigation, and shelving
• principles of the process to ensure a full understand-
ing of why the alarm is created as well as what could hap-
pen if the alarm is disregarded
• procedures that should be followed to shelve an alarm
or take it out of service. p Figure 4. The alarm-response procedure can be integrated into the HMI
Training is particularly important for safety-related to give operators easy access to critical information. Image courtesy of
alarms, such as those identified as a safeguard, as an Emerson Process Management.
Table 3. ISA-18.2 recommends these targets for performance and diagnostic metrics.
Metric Target Value
Percentage of hours containing more than 30 alarms <1%
Percentage of 10-min periods containing more than 10 alarms <1%
Maximum number of alarms in a 10-min period ≤10
Percentage of time the alarm system is in a flood condition <1%
Percentage contribution of the top 10 most frequent alarms to <1% (target), with a maximum of 5%
the overall alarm load Action plans are required to address deficiencies
Number of chattering and fleeting alarms 0
Action plans are required to correct any that occur
Number of stale alarms Less than 5 present on any day
Action plans are required to address excess alarms
Distribution of priorities of annunciated alarms 3 priorities: ~80% Low, ~15% Medium, ~5% High
4 priorities: ~80% Low, ~15% Medium, ~5% High, <1% Highest
Other special-purpose priorities are excluded when
calculating the value of this metric
Number of unauthorized alarm suppressions 0
(i.e., outside of controlled or approved methodologies)
Number of unauthorized changes to alarm attributes 0
(i.e., outside of approved methodologies or MOC)
should be presented no more than one to two alarms every Stage 9: Management of change
10 min. A related metric is the percentage of 10-min Even the most well-designed alarm system can experi-
intervals during which the operator receives more than ence problems if changes to it are not strictly controlled.
10 alarms, which indicates the presence of an alarm flood. Management of change ensures that modifications to
The ISA-18.2 standard’s recommended targets for perfor- the alarm system, such as changing a setpoint or adding/
mance and diagnostic metrics are shown in Tables 2 and 3. removing an alarm, are reviewed and approved prior to
Performance targets are approximate and are based implementation. An effective MOC process balances
primarily on what an operator is capable of handling. The the need for rigor and traceability with the need to make
use of these targets as metrics for a particular plant, and changes promptly to avoid impacts on production. For
the maximum acceptable numbers, will depend on many example, changing the limit for a safety-critical alarm may
factors, including the type of process, operator skill level, require a different level of review and authorization than
HMI design, degree of automation, operating environment, changing the deadband of a general process alarm. Once a
and types and significance of the alarms generated. For change is approved, the master alarm database should be
example, acceptable rates for alarms related to safety or updated and operators should be trained on the impact of
product quality in certain industries (e.g., nuclear, pharma- the change.
ceutical) are likely to be close to zero. The alarm philosophy should define the level of MOC
One of the most beneficial analyses is to routinely that is required based on the type of change and the alarm’s
review the top 10 or 20 most frequently occurring alarms. classification or priority. A contributing factor to the
In the absence of an effective alarm management program, Deepwater Horizon drilling rig accident was the practice
these bad actors may contribute 50%–80% of the overall of disabling the annunciation of the general master alarm
alarm load on the operator. Fixing these alarms represents designed to notify personnel of danger (fire or explosive/
low-hanging fruit for improving performance. toxic gas), in order to prevent false alarms from waking
Analyzing alarm system performance by class can personnel in the middle of the night (7). Perhaps if this
provide valuable information. For
example, it can identify whether Monthly Performance Review and Ongoing Alarm Rationalization
any safety-critical alarms are being
suppressed or behaving as nuisance Monthly Review / Update Cycle
alarms, both of which are indicators Identify bad actors Measure
of a dangerous situation. One of the Monthly Review and evaluate
of Alarm System alarm load
contributing causes to the accident Performance on the operator Analyze
at the DuPont Belle, WV, plant was
the frequent false (nuisance) alarms
generated by a burst disc sensor. The Tracking of
Rationalization Status
alarms from this sensor, which had Operations
been designated as OSHA-PSM- Feedback
critical equipment, were ignored by
operators because they had become Rationalization Status
accustomed to it behaving as a nui- Boiler
sance alarm (5). Steam Turbine
Feedwater Perform Delta Alarm Focus on
Alarm management is a continu- Condenser Rationalization bad actors
ous process that is never finished. Stack (e.g., 1 week per month) and on Improve
Measuring alarm system perfor- Ash Extraction highest-priority
mance and taking action on the Update unit that has
Master Alarm not yet been
findings is an important ongoing Database rationalized
activity and is critical to continuous
improvement. An effective alarm
philosophy documents the KPIs in a
format that clearly defines target vs. Update
Control System Implement
unacceptable levels, the frequency Configuration
of measurement and review, and
the personnel responsible for taking p Figure 5. Ongoing alarm management should include a periodic review of alarm-system performance,
action based on the results. followed by corrective actions when necessary.