A Method Supporting Monitoring And Repair Processes of Information Systems


To propose a method allowing for automated identification and repair of chosen classes of problems appearing in industrial IT systems.


PhD Thesis:

The research project has been summarized as part of a PhD thesis submitted to the Gdansk University of Technology in 2011.

Marek Kamiński:
"A Method Supporting Monitoring And Repair Processes of Information Systems"
(The Polish title: "Metoda wspomagania procesów monitorowania i naprawy systemów informatycznych”)
Many IT systems need supervision and maintenance 24 hours a day and 365 days a year and many IT companies offer a service of remote monitoring, technical support and assistance in running such systems.

Monitoring aims to give administrators of monitored systems a clear indication of what is wrong. The next step is to repair the system and solve the problem, so integrating aspects of monitoring and repair seems natural.

Problem of monitoring has already been automated to a high extent and existing solutions and monitoring systems usually accomplish their task in a satisfactory way. The repair problem, however, is not easy, because repair often involves manual and time-consuming interventions of administrators.

The research takes a pragmatic view and is underpinned by careful observations of industrial reality. Those observations have lead to conclusion that in many cases, interventions being conducted manually by administrators of the monitored systems are repeatable.

As they are repeatable, they can be automated and as they are triggered by unacceptable monitoring results, they may be integrated with the monitoring task, so the automation of repairs may incorporate existing and exchangeable monitoring components.


The following approach was taken in this project and it covers the following areas:

  • development of a general, and monitoring-independent, method allowing for identification of monitored objects and their states,

  • identification of representative classes of problems appearing in industrial IT systems,

  • development of a language framework allowing for describing repair algorithms easily,

  • using this framework to express repair procedures solving chosen problems,

  • development of a method supporting automated executions of the repair procedures,

  • development of a method integrating monitoring and repair processes,

  • design and prototype implementation of monitoring-repair system utilizing the proposed ideas,

  • assessment of effectiveness and efficiency of the proposed solutions.


A conceptual part of this research resulted in development of the Repair Management Method (RMMethod), being a formal method developed to automate executions of repeatable repairs of IT systems, incorporating into this process existing enterprise monitoring solutions to achieve this goal.

The method formulates all steps of a process leading to this automation, starting from a point where only monitoring of IT systems is implemented, and it comprises of the following three components, having mathematically grounded working fundamentals:

  • the Repair Management Model (RMM), expressed mainly in the Z notation: a part of the method consisting of two submodels: submodel of monitoring processes, giving an abstract representation of monitoring, being general enough to cover existing solutions to the monitoring problem, and submodel of repair processes, introducing an abstract definition of repair automation,

  • the Repair Management Framework (RMF), expressed in the Z notation: an extensible language framework, consisting of a set of routines and ideas, making up the, so called, repair library, which can be embedded and implemented in a high-level programming language to provide to programmers an abstraction layer, facilitate them writing the repair procedures,

  • the Repair Management System (RMS): a flexible architecture for IT system supporting the mentioned ideas, responsible for integrating them, so that they can start to work as a one whole, and be used in reality.

An implementation part of this research resulted in a development of a prototype of the RMS, and of the repair API, being a Perl (programming language) incarnation of the RMF.


