2014 IEEE International Conference on Cloud and Autonomic Computing pages:141-150
International Conference on Autonomic Computing edition:1 location:London date:8-12 September 2014
One aspect that permeates all large scale systems is the occurrence of failures. Continually, on any data center, failures are happening, either caused by malfunctioning disks, memories, network connections, or software bugs. Large scale failures - possibly caused be a ripple effect of smaller failures - are obviously even worse. The fact that failures are extremely hard or even impossible to predict makes them particularly challenging to cope with. A better alternative to predicting failures is creating systems that can cope with failures and autonomously adapt.
In this paper, we investigate a decentralized self-adaptive approach to a resilient system for service composition. Our approach is based on an agent coordination mechanism known as ‘delegateMAS’, which is particularly suited for large-scale coordination of systems. We thoroughly evaluate this approach through large and huge scale experiments of composite services. The results from these experiments show that it is possible to create service compositions which are resilient to large scale failures.