Steinert, Rebecca and Gillblad, Daniel (2010) Towards Distributed and Adaptive Detection and Localisation of Network Faults. In: 2010 Sixth Advanced International Conference on Telecommunications, 9-15 May 2010, Barcelona, Spain.
|PDF - Accepted Version|
Official URL: http://doi.ieeecomputersociety.org/10.1109/AICT.20...
We present a statistical probing-approach to distributed fault-detection in networked systems, based on autonomous configuration of algorithm parameters. Statistical modelling is used for detection and localisation of network faults. A detected fault is isolated to a node or link by collaborative fault-localisation. From local measurements obtained through probing between nodes, probe response delay and packet drop are modelled via parameter estimation for each link. Estimated model parameters are used for autonomous configuration of algorithm parameters, related to probe intervals and detection mechanisms. Expected fault-detection performance is formulated as a cost instead of specific parameter values, significantly reducing configuration efforts in a distributed system. The benefit offered by using our algorithm is fault-detection with increased certainty based on local measurements, compared to other methods not taking observed network conditions into account. We investigate the algorithm performance for varying user parameters and failure conditions. The simulation results indicate that more than 95 % of the generated faults can be detected with few false alarms. At least 80 % of the link faults and 65 % of the node faults are correctly localised. The performance can be improved by parameter adjustments and by using alternative paths for communication of algorithm control messages.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||adaptive probing, distributed fault-detection, fault-localisation|
|Deposited By:||Rebecca Steinert|
|Deposited On:||04 Feb 2011 14:43|
|Last Modified:||16 May 2012 12:42|
Repository Staff Only: item control page