Annual reliability and maintainability symposium, 2001 proceedings pages:246-251
2001 Ann. Reliability & Maintainability Symp. (RAMS2001) location:Philadelphia, PA, USA date:January 22-25, 2001
Industry-oriented fault tolerance solutions for embedded distributed systems should be based on adaptable. reusable elements. Software-implemented fault tolerance can provide such flexibility via the presented framework approach. It consists of 1) a library of fault tolerance functions, 2) a backbone coordinating these functions, and 3) a language expressing configuration and recovery. This language is a sort of ancillary application layer, separating recovery aspects from functional ones. Such a framework approach allows for a flexible combination of the available hardware redundancy with software-implemented fault tolerance. This increases the availability and reliability of the application at a justifiable cost thanks to the re-usability of the library elements in different targets systems. It also increases the maintainability due to the separation of the functional behavior from the recovery strategies that are executed when an error is detected as the modifications to functional and non-functional behavior are, to some extent, independent and hence less complex. Practical experience is reported from the integration of this framework approach in an automation system for electricity distribution. This case study illustrates the power of software-based fault tolerance solutions and of the configuration-and-recovery language ARIEL to allow flexibility and adaptability to changes in the environment.
Proceedings of 2001 Ann. Reliability & Maintainability Symp. (RAMS2001)