A Method Supporting Monitoring And Repair Processes of Information Systems

Objectives:

To propose a method allowing for automated identification and repair of chosen classes of problems appearing in industrial IT systems.

Contact:

The contact for this research project is here.

PhD Thesis:

The research project has been summarized as part of a PhD thesis submitted to the Gdansk University of Technology in 2011.

Marek Kamiński:
"A Method Supporting Monitoring And Repair Processes of Information Systems"
(The Polish title: "Metoda wspomagania procesów monitorowania i naprawy systemów informatycznych”)
Download PDF (2.7 MB)

Rationale:

Many IT systems need supervision and maintenance 24 hours a day and 365 days a year and many IT companies offer a service of remote monitoring, technical support and assistance in running such systems.

Monitoring aims to give administrators of monitored systems a clear indication of what is wrong. The next step is to repair the system and solve the problem, so integrating aspects of monitoring and repair seems natural.

Problem of monitoring has already been automated to a high extent and existing solutions and monitoring systems usually accomplish their task in a satisfactory way. The repair problem, however, is not easy, because repair often involves manual and time-consuming interventions of administrators.

The research takes a pragmatic view and is underpinned by careful observations of industrial reality. Those observations have lead to conclusion that in many cases, interventions being conducted manually by administrators of the monitored systems are repeatable.

As they are repeatable, they can be automated and as they are triggered by unacceptable monitoring results, they may be integrated with the monitoring task, so the automation of repairs may incorporate existing and exchangeable monitoring components.

Approach:

The following approach was taken in this project and it covers the following areas:

  • development of a general, and monitoring-independent, method allowing for identification of monitored objects and their states,

  • identification of representative classes of problems appearing in industrial IT systems,

  • development of a language framework allowing for describing repair algorithms easily,

  • using this framework to express repair procedures solving chosen problems,

  • development of a method supporting automated executions of the repair procedures,

  • development of a method integrating monitoring and repair processes,

  • design and prototype implementation of monitoring-repair system utilizing the proposed ideas,

  • assessment of effectiveness and efficiency of the proposed solutions.

Results:

A conceptual part of this research resulted in development of the Repair Management Method (RMMethod), being a formal method developed to automate executions of repeatable repairs of IT systems, incorporating into this process existing enterprise monitoring solutions to achieve this goal.

The method formulates all steps of a process leading to this automation, starting from a point where only monitoring of IT systems is implemented, and it comprises of the following three components, having mathematically grounded working fundamentals:

  • the Repair Management Model (RMM), expressed mainly in the Z notation: a part of the method consisting of two submodels: submodel of monitoring processes, giving an abstract representation of monitoring, being general enough to cover existing solutions to the monitoring problem, and submodel of repair processes, introducing an abstract definition of repair automation,

  • the Repair Management Framework (RMF), expressed in the Z notation: an extensible language framework, consisting of a set of routines and ideas, making up the, so called, repair library, which can be embedded and implemented in a high-level programming language to provide to programmers an abstraction layer, facilitate them writing the repair procedures,

  • the Repair Management System (RMS): a flexible architecture for IT system supporting the mentioned ideas, responsible for integrating them, so that they can start to work as a one whole, and be used in reality.

An implementation part of this research resulted in a development of a prototype of the RMS, and of the repair API, being a Perl (programming language) incarnation of the RMF.

Publications:

  1. Kamiński M.: System monitorowania prac organów przedstawicielskich - doświadczenie z zakresu e-demokracji, Proceedings of the 3rd National Conference on Information Technology, Gdańsk University of Technology, Poland (May 2005)
    Download draft version (164 kB)

  2. Kamiński M.: Monitorowanie stanu systemów informatycznych na przykładzie aplikacji opracowanej w firmie Lufthansa Systems, Proceedings of the 4th National Conference on Information Technology, Gdańsk University of Technology, Poland (May 2006)
    Download draft version (136 kB)

  3. Kamiński M.: Data replication and its monitoring, Proceedings of the 5th National Conference on Information Technology, Gdańsk University of Technology, Poland (May 2007)
    Download draft version (124 kB)

  4. Kamiński M.: XML-based monitoring and its implementation in Perl, Proceedings of the 2nd National TPD Conference, Politechnika Poznańska Press, Poland (2007)
    Download draft version (118 kB)

  5. Kamiński M.: HVRmonitor - data replication monitoring method, Proceedings of the 2nd AIS SIGSAND European Symposium on Systems Analysis and Design, University of Gdańsk Press, Poland (2007)
    Download draft version (189 kB)

  6. Kamiński M.: System Zarządzania Naprawami, Systemy czasu rzeczywistego. Postępy badań i zastosowania, WKŁ, Poland (2009) (The 16th National Conference on Real-Time Systems, Warsaw University of Technology)
    Download draft version (336 kB)

  7. Kamiński M.: Towards automating repairs of IT systems, Information Systems Architecture and Technology: advances in web-age information systems, Politechnika Wrocławska Press, Poland (2009) (The 30th International Conference on Information Systems, Architecture, and Technology, Wrocław University of Technology)
    Download draft version (212 kB)

  8. Kamiński M.: A System Automating Repairs of IT Systems, Elektronika Magazine (11/2009), Sigma-NOT Press, Poland (2009) (The 16th International Multi-Conference on Advanced Computer Systems, West Pomeranian University of Technology)
    Download draft version (223 kB)