Editorial ArchiveMaintenance and Health & SafetyMaintenance, Health & Safety

Partial Stroke Tests: A way to increase the reliability of ESD valves – Part 1

By Francesco Paolo Nigri, Corrado Delle Site, Maria Rosaria Vallerotonda, Danilo Sallustio of INAIL DIT

Listen to this article

(Part 2 will follow in our March/April edition)

Dangerous failures of Safety Instrumented Systems (SIS) are hidden until they are disclosed by proof tests or real demands raised by process plants. The standards IEC 61508 and IEC 61511 call the latter type of failures dangerous undetected (DU).

Proof tests are considered a means to reveal dangerous undetected failures by the standards. For the final elements of a SIS, proof tests usually consist in performing full stroke tests of the valves. These tests normally require to put the process offline.

Recently, partial stroke tests (PST) of Emergency Shutdown (ESD) valves have been introduced. To carry out PST means to prove the valve’s ability to perform its function (close on demand, for instance), by moving the valve stem without totally closing the valve itself.

Partial valve stem movements has proved to be good enough in order to reveal a portion of the dangerous undetected failures without stopping the process.

The fraction of the dangerous undetected failures, discovered by PST, among all the dangerous undetected failures, is called Test Coverage Factor (TCF).

If PST are performed at regular intervals, usually shorter than the proof test interval (T1), they may be useful to increase the unknown availability of ESD valves.

Nevertheless, only performing partial and full stroke tests in a quick succession allows to get pretty good values of the Test Coverage Factor (TCF) and, therefore, a satisfying reduction in the number of the dangerous undetected failures. The authors refer to these failures as λDU,NR.

1.0 Introduction

For a long time in the past, human observation was the only means available to detect accidents in many industrial operations. Consequently, human intervention was the only way to take a process to a complete stop.

Nowadays, automated control systems, made of sensors, PLC’s and valves, make it possible to monitor and control several processes. In order to avoid accidents, both sensors and transmitters have to be accurately set before any process is started.

Nevertheless, at the facilities of several chemical groups, many accidents have lately occurred due to the malfunctions of sensors and valves. According to some reliability database records, most of the malfunctions were due to random failures.

Process plants contain instrumented systems that perform safety functions, such as the systems that perform the function of Emergency Shut Down (ESD) or Blow Down (BD). Since no single safety measure is usually sufficient to eliminate process risks, any effective safety system consists of several protection layers.

This way, if one protection layer fails, other layers will take the process to a safe state. As the number of protection layers increases, the safety level of the whole process rises.

Safety Instrumented Systems (SIS) represent one of the protection layers. SIS are specifically designed to detect dangerous process states as they develop and start appropriate countermeasures. To fulfill its role, a SIS always consists of at least one sensor, one logic device and one final element.

This final element is usually an electric or pneumatic actuator connected to a valve. Safety Instrumented Systems are nowadays installed in process plants to mitigate hazards by taking the process to a “safe state” when predetermined set points are exceeded and safe operating conditions can be ensured no longer.

If process risks are not within an acceptable range, safety instrumented systems are used as one possible way to reduce them to a tolerable value again. The use of SIS must always be in relation with SIL levels (table 1).

Table 1 – PFD in relation with SIL levels

Probability of Failure on Demand (PFD)SIL
0,01 < PFD ≤ 0,11
0,001 < PFD ≤ 0,012
0,0001 < PFD ≤ 0,0013

We always hope that the SIS has never to be used because such use means that something has gone wrong with the process and, therefore, the process has to be shut down with all its associated operations.

However, if the SIS is called in use, it has to work reliably because the consequences of its failure are much more serious than the simple interruption of process operations.

The Probability of Failure on Demand (PFD) is the probability that the SIS fails to perform its safety function when required. In their simplest form, SIS consist of a sensors, logic devices and final elements, such as actuators and valves, that are exposed to influences that logic devices do not normally have to bear.

These include environmental conditions such as temperature, pressure, humidity, contaminations and vibrations. In addition, field components may be exposed to abrasive or corrosive elements.

When designing a SIS, it is crucial to select components able to withstand these conditions over the intended lifetime of the SIS, usually referred as the “mission time”.

If a single component fails during the mission time, the whole SIS will not be able to fulfil its safety function, in case HFT (Hardware Fault Tolerance) is equal to zero.

To improve the system availability, it is vital to carry out periodic tests. The IEC 61511 considers the manual tests a basic requirement for successfully employing actuators and valves in safety instrumented systems.

A few other technical questions have to be analysed before an actuator is selected for a SIS. The discussion below is far away from being exhaustive but experience shows that some design aspects sometimes result in major accidents when they are neglected.

In the following, the attention of the reader is focused on the pharmaceutical sector, due to the large number of operating sites as well as the diversity of the involved processes. In this sector, processes are quite complex and hazardous materials are often employed.

Moreover, sometimes reaction parameters are unknown or capable of sudden variations over time. This can lead to runaway chemical reactions that may even cause the rupture of the reactor’s safety disc. This rupture may be dangerous since it can result in the discharge of toxic substances into the atmosphere.

Therefore, in order to avoid unexpected reaction deviations, written operating protocols always need to receive adequate attention by the workshop technicians.

Let’s pay attention to a chemical reactor provided with appropriate safety barriers: a pressure threshold set, a temperature threshold set and pressure/temperature alarm triggers.

At first, the workshop technicians heat the mix of reagents by means of steam. Operating protocols usually call for maintaining the chemical mix at low temperatures since runaway reactions may be caused by excessive heat.

Heating the mix is often a critical step since the temperature can reach unbearable values within a few minutes. If this happens, the technicians are warned by the alarm and have time to stop heating and stirring the reactor.

At times, shutting off the stirrer may represent an exacerbating factor since it limits the heat transfer possibilities of the reactor. If the high pressure level is reached inside the reactor, at first the reactor can rely on the Basic Process Control System (BPCS).

The pressure switch signals the BPCS which, in turns, signals the actuator to close the outlet valve. Whether the BPCS does not intervene due to a failure of the valve, the SIS is immediately required to start the emergency shutdown routine (figure 1).

partial stroke tests fig1
Fig. 1 – Basic Process Control System (BPCS) versus Safety Instrumented System (SIS)

The Emergency Shut Down Valve (ESDV) acts as a safeguard against the exceeding pressure caused by the BPCS malfunction. During normal operation, the valve remains open for an extended period of time which amounts to months and sometimes even to years.

The ESDV must intervene right away during an emergency and manage the situation by taking the process back to a safe state as soon as possible. In case the pressure set point is exceeded or the electrical power is missing, the ESDV will close right away in order to isolate the reactor. Since de-energising the solenoid leads to shutting the valve down regardless of the process, the valve works in a “Fail Safe” manner.

At this point, a problem needs to be taken into account since long experience has shown that safety related valves can stick open if they are not periodically tested. The general perception is that sticking is one of the main failure modes of safety related valves, such as the ESD valves.

Sticking may be caused by several factors but dirt and corrosion seem to be the most frequent causes. Sudden movements of the valves can both reduce dirt build up and give an indication whether corrosion is still present since the stroking time keeps being longer than the required time.

This study is based on the analysis of the failure modes catalogued in the database of a valve vendor, with enough information to develop a good understanding of the event, in terms of causes, circumstances and consequences.

All the valves were used in process shutdown applications starting from 2017. At the beginning, a population of 10000 final elements (valves with their own actuators) was taken into account.

Considering an estimated percentage of 60% of all the “returns from the field”, the listed malfunctions provided a primary sample of the 798 total cases, involving the facilities classified into the manufacturer database (table 2).

Number of final elements on the market 10000
Number of failures (returns from the field) 479
Time1 year (8760 hr)
Percentage of reported failures 60%
Estimated number of failures798
Failure rate9904 FIT
Reliability91,7%
Table 2 – Failure rate of the final elements (2017 sales campaign)

In table 2, the failure rate λ was obtained using the following equation, where the calendar year (2017) is assumed to be the operating time:

formula 1

where:

  • Ng is the number of final elements affected by failure in one year (2017);
  • Nf  is the number of the final elements still functioning at the end of 2017.

The reliability was then estimated by supposing λ equal to a constant within the period of observation. In this case, the reliability function has the following analytical expression:

formula 2

The Failures In Time (FIT) are the number of failures expected in one billion (109) hours of operation. In order to narrow the samples, the vendor decided to consider only the recorded failure modes which are summarized as follows:

  1. Fail To Close (FTC);
  2. Delayed Operation (DOP);
  3. Leakage in Closed Position (LCP).

Any of these malfunctions could be responsible for either triggering a serious accident or exacerbating the accident. Any valve, which determined to be non-functional while being tested, needed quick maintenance and the process had to be shut down right away for valve repair.

Emergency shutdown valves are tested at every turnaround by using a full stroke test to demonstrate their performance. Some years ago, the site turnarounds were set up every year. Due to increased material reliability and preventive maintenance programs, companies now tend to set up the site turnarounds every two years.

Extended turnarounds allow great economic returns through increased production. Extended turnarounds also mean that emergency shutdown valves are expected to work longer with the same performance.

Since it is not possible to achieve the same performance with longer proof test intervals, partial stroke testing is used to supplement full stroke testing in order to reduce the shutdown valve PFD. In the Chapter 4.0, this paper will present a discussion on the three partial stroke tests that are currently being used by industry:

  • Partial Stroke Test (PST);
  • Full Stroke Test (FST1);
  • Full Stroke Test in operating conditions (FST2).

As a consequence, three secondary samples were obtained in 2018. In order to get these secondary samples, a population of 9995 final elements was initially considered by the vendor: 10000 cases minus 5 critical failures discovered in 2017.

Readers are cautioned that attention is focused on rotary valves rather than stem rising valves. At the end of 2017, five rotary valves were withdrawn from the market since their actuator sizing proved to be insufficient to rotate the valve stem in emergency conditions. The final elements were characterised by the “classified failure rates” shown in table 3, where:

  • the failure rate λ comes from table 2 and, therefore, it is equal to 9904 FIT;
  • the Diagnostic Coverage (DC), provided by the manufacturer, is equal to 60%.

Table 3 – Classified failure rates of the final elements

λ – failure rate9904 FIT
λ S – Safe failure rate4952 FIT
λ D – Dangerous failure rate4952 FIT
DC – Diagnostic Coverage60 %
λ DD – Dangerous Detected failure rate2971 FIT
λ DU -Dangerous Undetected failure rate1981 FIT

In table 3:

  • λS , Safe failure rate, is equal to 0,5 times λ (assumption based upon CEI EN 61508-6, Annex B, Table B.1);
  • λD , Dangerous failure rate, is equal to λS ;
  • λDD , Dangerous Detected failure rate, is equal to 60 % of λ D ;
  • λDU , Dangerous Undetected failure rate, is equal to λD minus λDD .

A quick examination of the simplified equations, provided by IEC 61508 for predicting PFD, shows that the most influential variables are:

  • Dangerous Undetected failure rate (λDU);
  • Proof Test Interval (T1);
  • Diagnostic Coverage (DC).

The secondary samples were ultimately narrowed down to pharmaceuticals, already identified as the industrial sector for which the degree of automation is considered higher than the average of other sectors.

Some sectors of activity, such as oil refining that is not so heavily automated, were not selected for the present study. This is a limit that tends to under-represent all the other chemicals.

However, since the secondary samples were based on events, subsequent to malfunctions reported to the manufacturer from the field, the data collected in 2018 prove to be useful for two main reasons:

  • provide precise information regarding the final element reliability;
  • help establish the criticality of final element malfunctions within facilities that need to be closely monitored, such as the Seveso-rated sites.

2.0 Proof testing

A more accurate analysis of the 479 failures, listed during 2017, showed that:

  • 287 failures were caused by leaks of various types;
  • 176 failures involved the valve falling to close because of sticking;
  • 16 failures were due to delayed operations.

If we rely on this realistic observation, sticking is second only to leakage as the main failure mode of safety valves. In terms of percentage, sticking is very significant since it represents the 37% of the overall failure modes. If safety valves are fully exercised, the probability of safe process shutdowns increases.

Unfortunately, it is possible to fully test these valves at scheduled plant shutdowns. This may mean intervals of one, two or more years among valve tests.

Given the requirements of IEC 61511 to preserve Safety Integrity Levels (SIL), these long intervals among valve tests are not in compliance with the requirements of the standard to keep the Probability of Failure on Demand (PFD) sufficiently low.

Partial stroke testing of the safety valves help mitigate this problem. The main advantage of partial stroke testing consists in providing a measure of confidence that a valve is not stuck open and it will not do so at short intervals when it is suddenly required to intervene to shut down a process.

The test has both a preventive and a corrective aspect. The valve movement can dislodge any dirt build up to help prevent sticking. If the valve is already stuck open, the test will detect it and corrective measures can be taken.

Proof testing can also be automated (figure 2). PLC-based safety systems are quite capable of being programmed to perform the partial stroke tests as well as to record the number of failures. These numbers were used by the manufacturer to determine the three secondary samples.

Before showing these samples, the notion of Test Coverage Factor (TCF) has to be given in details.

partial stroke tests fig2
Fig. 2 – Partial Stroke Test (PST) diagnostics by TopWorxTM

END OF PART 1
Read part II here

Show More

    Would you like further information about this article?

    Add your details below and we'll be in touch ASAP!


    Input this code: captcha

    Francesco Paolo Nigri

    INAIL, Direzione Regionale della Puglia

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top button

    Join 25,000 process industry specialists and subscribe to:

    PII has a global network of suppliers ready to help...