RCM – What the chuff* is that?

It doesn’t seem that long ago, really, when I had to research the topic in order to prepare for a job interview. Though I’d worked in an environment that had (unbeknownst to me) been created with a reliability-centered mindset, I had never heard of it, let alone understood what it meant.

Finding out exactly what RCM is was difficult and my assumption here is that, if you’re reading this, you’ve probably read other articles on the subject but, do you really know what it is?

Reliability-centered maintenance (RCM), what does the phrase suggest to you?

“RCM is a specific process used to identify the policies which must be implemented to manage the failure modes which could cause the functional failure of any physical asset in a given operational context.”
– SAE JA1011

One of the difficulties, I find, is that the phrase is used in common parlance to describe a process for the derivation of scheduled maintenance but, I’m not sure that I hold with that. It’s probably splitting hairs but, for me, there are two aspects of RCM:

The concept
The analytical process

The concept

The concept, for which the phrase was coined, was presented as a challenge, an afront to the preeminent philosophy of the time, that all complex things wear out and, therefore, must be maintained at a given age.

If maintenance were boxing (hence the slightly goofy image at the top of this page) then, what I’ve glibly termed, ‘indiscriminate maintenance’ would be the reigning champion with reliability-centered the challenger.

F. Stanley Nowlan and Howard F. Heap published the paper ‘Reliability-centered maintenance’ in 1978. The first sentence of the first paragraph of the introduction states that ‘the term reliability-centered maintenance refers to a scheduled-maintenance program designed to realize the inherent reliability capabilities of equipment’. That’s it.
That’s what the name means but nothing in that definition describes, or even implies, the nature of the analytical process.

The analytical process

The actual process that is followed, in order to implement an RCM programme, is the RCM analysis.

RCM did more than suggest that we should maintain our assets in the knowledge that they fail differently (though that would have been improvement enough) it also suggested that not all failures are equal and, further, that not all failures need to be mitigated.

How can the judgement, to maintain or not to maintain, be made?

In 1999, the Society of Automotive Engineers (SAE) International issued JA1011- Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (I’ve used it, in fact, to provide the first quote in this piece regarding the definition of RCM) which details a sequenced set of steps that the analytical should follow:

a. Determine the operational context and the functions and associated desired standards of performance of the asset (operational context and functions)

b. Determine how an asset can fail to fulfil its functions (functional failures)

c. Determine the causes of each functional failure (failure modes)

d. Determine what happens when each failure occurs (failure effects)

In order to satisfy steps a – d, the RCM analysis uses a technique called failure modes and effects analysis (FMEA) (or a variation of it). FMEA is a discipline in its own right which allows a skilled user to understand credible points of failure (and their effects) in a design, process, product or service.

When used properly, FMEA is a great framework for understanding but, that’s it, understanding. There is no inherent decision-making mechanism in FMEA; it has to be used to inform decisions.

To maintain or not to maintain? That this the question.

(..and, if to maintain, how?)

Unlike the indiscriminate tendency to maintain everything complex, RCM analysis allows a choice. A choice which is based on a balance of failure consequences vs failure dynamics vs practicalities vs economics. That entire balancing act is played out during steps e to g.

The decision to maintain or not, is not made by the results of steps e, f or g in isolation. It’s a combination of the answers that will determine if maintenance is appropriate.

e. Classify the consequences of failure (failure consequences)

Based on the effects of losing a function, as detailed in the FMEA (Step d), is it a problem? Will the operator know that the function has been lost? Is the loss of function contrary to safety or the environment or the bank balance or whatever else the organisation deems important?

f. Determine what should be performed to predict or prevent each failure (tasks and task intervals)

In the context of a specific failure mode, as detailed in the FMEA (Step c)…

OK, there’s a lot going on here at Step f – it’s two steps in one really – but what’s important here is the ability to understand the dynamics of a failure. At Step c, the analysis will have identified a specific failure mode and that failure mode will manifest itself in a specific way. In understanding that failure mechanism a decision can be made whether (or not) to monitor the condition of the asset, looking for signs that it is moving toward failure, in the hope that early identification will give time to plan corrective maintenance – before failure happens.

If though, monitoring is not appropriate, can it be seen or proved, that at a certain point in an assets operational life, the failure mode will continue to increase the assets probability of failure? If so, then a maintenance action may be applicable to restore the assets resistance to failure or to simply replace the asset before it reaches that age.

g. Determine if other failure management strategies may be more effective (one-time changes)

So, if nothing can be done to predict or prevent a failure, then what can be done? Again, there’s a lot going on here but I’ll summarise into three options. 1.) In certain circumstances, a test could be introduced to try to identify a failure that has already happened. 2.) If we can’t predict, prevent or identify the failure it may be prudent (or necessary) to change the design in order to reduce or remove the effects of failure. 3.) If the consequences allow, the option to do nothing (to run to failure) can be taken here.

What did I just read?

Good question. ‘RCM? What the chuff is that?’ – the title of this article – is a paraphrase of the question that I get asked most, when I tell the uninitiated that that’s what I ‘do’ and that’s OK, that’s exactly the position that I was in when researching for the job (where the article started).

What I’ve found since working in this field, though, is that it can be difficult to digest exactly what it is – even if you work somewhere near it! You see, words matter.

What I’ve tried to do here is explain the origin of the term reliability-centered maintenance and, by stepping briefly through the analytical process, demonstrate that the term itself can be (is) misleading.

I’ve shown that the RCM analysis decision making process is based on consequences first and then the dynamics of failure modes and then the practicalities of doing any proposed activities – you’d be hard pushed to put reliability at the centre of that.

So, if RCM is about realising the inherent reliability of an asset but reliability is not the major feature of the decision process, how do they relate?

Remember, prevalent at the time (and still in places) was this assumption that older things must wear out and therefore that there must be a correct age where all complex items need to be maintained; RCM, though, by inducing maintenance only where consequences dictate and failure dynamics allow works to minimise the effects of maintenance itself, on otherwise serviceable assets which, in turn, allows the asset to better realise it’s inherent reliability.

*Chuff – Polite euphemism used to replace a popular expletive, the exact meaning varies depending on context.

Required reading on the topic:

Reliability-centered maintenance, Nowlan and Heap
SAE JA1011 – Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes
RCM2 – John Moubray