Synopsis In their paper Byzantine Fault Tolerance, from Theory to Reality, which appeared in the volume in S. Anderson, M. Felici and B. Littlewood, eds, Computer Safety, Reliability, and Security, 22nd International Conference, SAFECOMP 2003, Edinburgh, UK, September 23-26, 2003, Lecture Notes in Computer Science volume 2788, Springer-Verlag, 2003., Kevin Driscoll, Brendan Hall, Hakan Sivencrona, and Phil Zumsteg recount a series of incidents to the digital control system of a major Fly-By-Wire commercial aircraft type, which almost led to the type being grounded. Driscoll et al. describe the incidents thus:
Byzantine faults are faults in which agents (sensors, computers) in a distributed system "lie" to their interlocutors: they do not fail silently but distribute erroneous data, or data which is read differently by different receivers. The name arose from a whimsical analogy by Lamport, Shostak and Pease to a group of Byzantine generals trying to reach agreement in a situation in which no one trusts anyone else to speak the truth. The classic papers from twenty years ago are [Refs], and I understand arose from SRI International's involvement in trying formally to verify the operating system of the first digital flight control computer, SIFT.
Dealing with Byzantine faults became an extremely active area of distributed computing theory, but practitioners did not take them so seriously at first, perhaps partially due to the very high resource consumption of the solutions: Lamport, Shostak and Pease showed that any correct algorithm to achieve consensus required a large number of processors (roughly speaking, at least 3n+1, where n is the number of "liars") and a lot of processor cycles. It follows that solutions judged to be practical are unlikely to be complete solutions, and therefore one must analyse the actual problem space more closely to find out where one can most profitably handle possible problems, and which areas one can ignore.