RVS-Bk-17-01
(draft Version 1.0 from 2017-12-10)
Faculty of Technology, Bielefeld University, ladkin@rvs.uni-bielefeld.de
Causalis Ingenieurgesellschaft mbH and Causalis Limited, ladkin@causalis.com
This is a draft manuscript. All rights are reserved by the author.
Many companies provide digital-computation-based products for use in safety-critical systems, and many of the more straightforward products have shown themselves to be dependable. How do you evaluation an operational history of a device, or of software, to show this? Current guidance in IEC 61508 is sparse, and judged by many to be inadequate.
I establish the conceptual basics of the most common models for assessing the failure behaviour - rather, the lack-of-failure behaviour of software in order to gain confidence in its operation
IEC has various "modes" of operation - low-demand mode, high-demand mode and continuous mode. Associated with these are various concepts such as average probability of failure on demand, and probability of failure per operational hour. If statistical modeling is to be used, these IEC 61508 concepts must be connected with the statistical-model parameters, which have similar names. This chapter discusses the connection.
Statisticians model statistical processes, and understand that you cannot necessarily convert one deterministically into another. However, there have been some attempts to try to connect the parameters of a Bernoulli-Process model with those of a Poisson-Process model for the same stochastic process. I show briefly that and how this is misplaced.
Bertrand Ricque posed the question how and/or if the IEC 61508 notion of average probability of failure on demand (PFDavg) is related to the Bernoulli-Process notion of probability of failure on demand (pfd). This chapter shows they are two different concepts without a useful relation.
There are some concepts which are key for critical-system assurance which I think could do with discussion and refinement. Here is some discussion.
The concept of system integrity is regarded as central to dependability, not only by engineers but by many others dependent on those systems. But there are at least three very different notions of integrity in IEC definitions alone, as well as a different one from IFIP. To make matters worse, I define here two facets of integrity, different from the others, which I argue better fit what system users expect from a system which retains its integrity. I suspect there are even more facets to be elucidated.
As an example, I consider a buffer-overflow exploit of a system. It turns out that, according to many of the definitions of "integrity" from the last chapter, such an exploited system retains its "integrity". It does not, according to my definitions.
I think many would agree that the integrity of a nuclear-missile launch system is critical. There has been worry that new technologies" (in particular, the advent of highly-capable deep-learning neural-network-simulation software, now known generally as "AI) highlight vulnerabilities in the US launch system. There has been thirty years of public discussion of such vulnerabilities. I suggest where we might want to look first.
The "CIA triad" of system cybersecurity properties are confidentiality, integrity and availability. It is not often realised how different these concepts are. Availability is arguably a pure system property, integrity may or may not be, depending on your preference, and confidentiality most certainly is not.
People like to say that verification is making sure you have got your system right, and validation is making sure you have got the right system. Useful, but that is not what the IEC definitions say. Time to change them?
Formal (mathematical) methods are loved by some people (such as myself), who see how, when properly used, they enhance the dependability of systems. But many systems development companies think they are too "hard" and/or cumbersome to use, and would bring little benefit to their operations. What can you use FM for? What, specifically, do they achieve? This chapter provides some guidance.
In 2012, Bev Littlewood and John Rushby showed that you could show a two-channel system to be ultra-dependable if the second channel could justifiably be taken to be "possibly perfect". Many systems have such "basic fall-back" architecture. Littlewood and Rushby elucidate the conditions for such an architecture to be ultra-dependable. I think this is highly-significant work. This chapter gives a short overview for those who do not want to read the full paper.
Everybody's most pressing topic nowadays.
One often hears it said that safety and cybersecurity are two radically different topics, to be handled very differently from each other. Perhaps. I think the assessment processes for both are very similar and I have had success with cross-applying them. This high-level view shows why this may be expected.
Many engineers do not realise how sparse the provisions concerning cybersecurity are in the general functional-safety standard IEC 61508 and the IACS functional-safety standard IEC 61511. I list them all (in a couple of pages) and indicate why they are insufficient.
The official IEC notion of "risk" is the combination of the probability of an event with the severity of its outcome. This does not fit the way people talk about cybersecurity risks, largely because "probability" is not a helpful notion. I suggest what people are talking about when they speak of security risk.
It is critical in maintaining the cybersecurity of an IACS system to maintain an inventory of subsystems and components and to know quickly and accurately when a "security patch" should be applied inside your system. But guidance on this in existing standards is vague. Here is an attempt to be more precise.
There is guidance available (and some being written) on how to go about assuring the cybersecurity and safety of an IACS. I consider a specific example of a process plant and ask how well that guidance works. Some of it is better and some of it is worse.