Title: Fault-tolerant routing, reconfiguration and backward error recovery for parallel systems
Authors: Bieker, B ×
Deconinck, Geert
Maehle, E
Vounckx, J #
Issue Date: Jul-1997
Publisher: C r l publishing ltd
Series Title: Computer systems science and engineering vol:12 issue:4 pages:245-253
Abstract: Despite the improvements in hardware design parallel systems lack on dependability due to the huge amount of components they consist of. One possibility to introduce fault-tolerance into such systems is backward error recovery where failed modules can be replaced by spares. This work describes an approach to build a fault-tolerant parallel system. Therefore system reconfiguration and recovery based on check-pointing and rollback is presented as well as a fault-tolerant routing algorithm. The enhancement of the acceptance of fault-tolerance is reached by the integration of a user-transparent routing, reconfiguration, checkpointing and rollback protocol. Furthermore, the restriction to a fail-silent failure model (used in many approaches) is released in our work towards a fail-time-bounded behavior.
ISSN: 0267-6192
Publication status: published
KU Leuven publication type: IT
Appears in Collections:ESAT - ELECTA, Electrical Energy Computer Architectures
× corresponding author
# (joint) last author

Files in This Item:

There are no files associated with this item.

Request a copy


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science