Key points

Listen to this article

Danilo Sallustio, INAIL, Direzione Regionale Puglia, d.sallustio@inail.it

Francesco Paolo Nigri INAIL UOT Bari, Italy f.nigri@inail.it

Risk assessment is one of the most important aspects of industrial safety management. Unfortunately, it also represents one of the most difficult points to implement due to the need of complex techniques. The work aims at illustrating a methodology developed for calculating the probability of failure of safety systems. This calculation is crucial for the “functional safety”, nowadays included among the techniques adopted in the chemical industries to reduce technological risks.

Reliability, probability of failure, functional safety, process safety, instrumentation.

I. Introduction

In the past, people used to think that failures of safety systems were caused by human errors rather than design discrepancies or lack of maintenance. Functional safety, based upon the international standard “IEC 61508”, gives a different point of view: in order to achieve process safety in high risk industries, it is not enough to adopt standardised procedures, appropriate to reduce human errors, but it is also necessary to use “reliable instruments”.

In dangerous atmospheres, a basic requirement consists in using instrumentation that do not generate sparks. For this purpose, useful requirements are outlined in the ATEX Directive. On this point, functional safety raises a basic issue: it is no longer sufficient to equip safety systems with the most suitable instrumentation. It is also appropriate to equip safety systems with reliable instrumentation, capable of ensuring the performance of safety functions over time.

II. IEC 61511 contribution

A. Safety Instrumented Systems

According to IEC 61511, an international standard for the process industry, to prevent accidents it is appropriate to introduce several Independent Protection Layers (IPL) to protect the equipment under control. In addition to Basic Process Control Systems (BPCS) and alarms, IEC 61511 strongly recommends the adoption of Safety Instrumented Systems (SIS), suitable for bringing the process to safe conditions in case of other layers malfunction.

In its simplest architecture, a SIS includes at least one sensor, a logic unit and an actuated valve which represents its final element.

B. Safety by Design

To keep process plants under control, the process variables have to be constantly monitored. Most of the measurement techniques involve the use of components exposed to the process. It is, therefore, required to use components not only suitable for the process, but also capable of withstanding severe process conditions.

This concept, known as “Safety by Design”, leads to particularly accurate designs. Sensors suitable for Safety Instrumented Systems have to be designed to reduce dangerous failures, not immediately detectable, through an internal diagnostic coverage (failure avoidance).

Since instrumentation failures cannot be completely avoided, redundancy generally represents a valid solution to ensure process safety (failure tolerance). Redundancy is obviously guaranteed by multiple sensors used in a parallel architecture.

III. Functional safety

However, safety based on Diagnostic Coverage (DC) and Hardware Fault Tolerance (HFT) is not sufficient according to functional safety. Components of Safety Instrumented Systems are required to be periodically tested by adopting fully tested operating procedures (failure detection).

Carrying out tests in operating conditions (manual proof tests) helps to ensure the reliability of the components by reducing their Probability of Failure on Demand (PFD).

The main purpose of the IEC 61508 is the calculation of the PFD of Safety Instrumented Systems. In the following, the theoretical basis of the relationships, provided by IEC 61508, will be shown by using Markov models.

A. Mean Down Time

The calculation of PFD is based upon the Mean Down Time (MDT). The Mean Down Time is the interval of the mission time during which the SIS remains failed and it is, therefore, unable to perform its safety function.

This result is important to analyse the PFD value in practice and thoroughly understand the content of the international standard IEC 61508.

Fig. 1 mean Down TIme — Fig. 1 – Mean Down Time

B. Contributions to the Mean Down Time (MDT)

The Mean Down Time collects two different contributions, corresponding to:

failures detected by internal diagnostics;
failures that remain undetected until manual proof tests are carried out.

The Mean Down Time includes the time to detect and then to repair the failure. The down time for ‘detected failures’, allocated by internal automatic diagnostics, is relatively short. The down time due to ‘undetectable failures’ is much longer. Failures may remain undetected until the next proof test. Above all, the Mean Down Time depends upon the interval between two manual proof tests.

C. Calculation of the probability of failure

Dangerous undetectabled failures occur continuously at a constant failure rate (λ_DU). The accumulating random failures follow an exponential distribution, that can be described as follows.

Fig. 2 PFD versus Time — PFD versus time

It makes no sense for the IEC 61508 to consider PFD values greater than 0.1. Within the interval [0; 0.1] the PFD trend is linear with a quite good approximation. Therefore, in the interval [0; 0.1] the following expression of PFD can be considered valid:

To simplify the calculations, the IEC 61508 refers to the average value (PFD_AVG) of the instantaneous probability of failure PFD(t). To calculate PFD_AVG, the IEC 61508 considers a time interval with extremes corresponding respectively at:

the initial time (t = 0);
the start time of the first manual proof test (t = T1).

D. PFD_AVG

In the time interval [0; T1], we are now going to calculate the average value (PFD_AVG) of the instantaneous probability of failure PFD(t).

Since λ_DU is constant over time:

The SIS remains failed till the time T1 when a manual proof test is eventually carried out. Dangerous failures remain “undetected” for a period of time whose average amplitude is equal to half the time interval T1.

Dangerous detectable failures are:

continuously allocated by automatic diagnostic functions;
repaired at once, within the Mean Time To Restoration (MTTR).

So, we need to consider the Mean Time To Restoration (MTTR) to represent the Mean Dead Time (MDT) properly. The Mean Time To Restoration is usually measured in hours and it is much shorter than T1, which is generally expressed in months. Neverthless, the Mean Time To Restoration cannot be neglected.

The proof test interval, which is usually identified as T1, is an important parameter since a periodical test of maintenance of the whole safety system takes place at time T1. Manual proof tests are carried out to allocate undetected dangerous failures. Failures undetected by internal diagnostics accumulate as time progresses. Soon after a manual proof test, the SIS can be regarded as new and its PFD is equal to zero once again.

Effect of periodical proof tests on PFD trend

IV. pfd_avg calculation of a 1oo1 system

A 1oo1 system has only one channel to link sensor, logic solver and final element. The PFD_AVG depends upon proof test interval T1. The contribution, provided by the IEC 61508 to the calculation of the PFD_AVGvalue, results in the following equation for the 1oo1 architecture:

where t_DE is the equivalent mean down time of the safety system:

The failure probability is calculated taking dangerous undetectable and dangerous detectable failures into account. These failure probabilities can be obtained for a 1oo1 system introducing a few Markov models.

At first, we have to define the Diagnostic Coverage (DC) as the ratio of all dangerous failures detected by the internal diagnostics to the total amount of dangerous failures:

Since:

we can write:

A. Markov models

The Markov model is a logical model useful to describe systems whose status changes in a random way. The model is particularly suitable for showing the behaviour of safety systems whose state suddenly changes over time due to accidental failures of their components. Attention is first focused on dangerous undetectable failures.

Markov model for DU type failures

The system status suddenly changes from 1 (healthy) to zero (failed) due to a DU type failure. The instant when this happens cannot be estimated as DU type failures are random failures that can happen at any time. On the other hand, the time required to restore the system from zero (failed) to 1 (healthy) is well known. Attention is also concentrated on dangerous detectable failures.

Markov model for DD type failures

B. IEC 61508 function block

One thing makes a difference between the probability of dangerous undetectable failures (DU type) and the probability of dangerous detectable failures (DD type): the Diagnostic Coverage (DC).

The higher the Diagnostic Coverage is:

the higher the likelihood of DD type failures will be;
the lower the likelihood of DU type failures will be.
probability of dangerous failures

Failure type	Probability of occurrence	Time to restore
DD type	DC	MTTR
DU type	1-DC	½ T1+MTTR

Recalling the equation (11) gives us the chance to write it down this way:

probability of dangerous failures

Now we are able to completely understand the function block provided by the IEC 61508 to calculate the equivalent mean down time (t_DE) of a safety related system.

Fig 8 Function block for equivalent mean down time

Function block for equivalent mean down time

The function block is in compliance with the equation (10) that is now fully explained.

C. Conclusions

The IEC 61508 calculates the probability of failures in order to fulfill the SIL requirements of safety related systems. The user of a safety related system have to ensure at any time that the system stays within the required limits of the probability of failure. The paper detailed the calculations of the probability of failure in the low demand mode of operation and it considered only failure rates of dangerous failures.

References

[1] International Standard IEC 61508: Functional safety of electrical, electronic, programmable electronic safety related systems. Part 6. Geneva: International Electrotechnical Commission

[2] Goble W. M. & Cheddie H., 2010. Safety Instrumented System verification: practical probabilistic calculations, Exida, Sellerville.

[3] Börcsök J., 2007. Calculation of PFD values for a safety related system. Risk, Reliability and Societal Safety – Aven & Vinnem.

[4] Generowicz M., 2014. An explanation of the principles behind failure rate equations. I&E Systems Pty Limited, Perth, Western Australia.

Phil Black - PII Editor 22/12/2022

4,480 7 minutes read

The Calculation Of The Probability Of Failure According To The International Standard IEC 61508

I. Introduction