Abstract:
One of the main tasks of IT business continuity planning (BCP) is to guarantee that incidents affecting the IT infrastructure do not affect the availability of IT-dependent business processes beyond a given acceptable extent. Carrying out BCP of information systems is particularly challenging, because it has to take into consideration the numerous interdependencies between the IT assets typically present in an organization. In this paper we present a model and a tool supporting BCP auditing by allowing IT personnel to estimate and validate the Recovery Time Objectives (to be) set on the various processes of the organization. Our tool can be integrated in COBIT-based risk assessment applications. Finally, we argue that our tool can be particularly useful for the dynamic auditing of the BCP.
Introduction:
Business Continuity (BC) is the discipline supporting an organization in coping with the disruptive events that may affect its IT infrastructure. The goal of BC is to guarantee that – after incidents – the infrastructure will recover operations within a predefined time. This is achieved by carrying out a Business Continuity Plan (BCP), which is part of the Risk Mitigation phase of the Information Risk Management process. In general, Risk Mitigation (RM) consists in developing and implementing a strategy to manage potential harmful threats to the information systems. Since risk may not be completely avoided because of financial and practical limitations, RM (and BCP as well) includes the evaluation and the conscious acceptance of a residual risk.
BC is quickly becoming a best practice among both enterprises and organizations also due to recent legislation such as the Sarbanes-Oxley Act (SOX) of 2002 or the Basel II [2] accord, which explicitly requires it. Until recently, no widely agreed methodology was available to carry out a BCP. The new standard BS25999 [7], published in 2006 by the British Standard Institute, has changed this situation providing guidelines to understand, develop and implement a BCP, and it aims to become a standard methodology. Notably, BS25999 requires an organization to (1) identify the activities/processes supporting the core services used by the organization, (2) identify the relationships/ dependencies between activities/processes, (3) evaluate the impact of the disruption of the core services/ processes previously identified (during the Business Impact Analysis, BIA).
One of the main goals of any BCP is achieving that crucial business processes should recover from disruption within a predefined Maximum Tolerable Period of Disruption (MTPD). The MTPD expresses the maximum acceptable downtime to guarantee the business continuity. As expected, the MTPD depends heavily on the organization business goals and therefore is defined on the business processes, and is determined by the business unit. Since business processes typically depend on a variety of underlying IT assets, the MTPD has a direct and indirect impact on the maximum downtime that these assets may exhibit in practice. Indeed, the standard technical mean to realize a given MTPD is to define Recovery Time Objectives (RTOs) on all IT assets supporting business activities for which the BIA has determined that it is necessary to ensure continuity; RTOs strongly depend on the technical and organizational measures the IT department implements to deal with incidents.
Nowadays, determining RTOs that apply to the IT assets is done manually, and it is a subjective work which heavily depends on the experience of the IT personnel. This is not only error-prone, but it does not scale well (to the point that often, determining RTO’s is not even done for all entities, despite being required by the standard methodology). Moreover, it is inconvenient in case of changes in the IT infrastructure or in the business goals. In particular, new contracts and agreements can have an impact on the quality of service a business process should deliver and ultimately on the MTPD associated to it. Likewise, changes in the IT infrastructure may affect dependencies and therefore the impact of the IT assets on the business MTPDs. In both cases, adapting the BCP to these changes, usually requires a costly new analysis involving both the IT and business units of the organization.
We present a new model-based tool to support the analysis of temporal dependencies among IT assets and between IT assets and business process. The primary goals of our model and tool are (1) to support the IT department in setting and validating the RTOs of the IT assets of the organization (2) to evaluate assigned RTOs w.r.t. the given MTPD to find critical points in the IT infrastructure. Ultimately, our model allows one to put down the fine-grained set of premises and assumptions to infer that a given MTPD will be achieved, thereby obtaining a more objective assessment of the behaviour of the IT infrastructure.
While achieving these goals, we argue that our model is particularly useful for dynamically auditing the BCP in various ways: first, the tool allows one to visualize immediately how changes in business goals or in the IT infrastructure affect the compliance with given (or modified) MTPDs; in particular, it is possible to compute whether the measures already in place continue giving enough guarantees also after the changes. Secondly, it allows one to validate the actual response of the IT infrastructure w.r.t. the expected behaviour, promoting a continuous refinement of the model which can adapt to new external circumstances, allowing for early detection of new threats to the business continuity targets.
Technically, our model is an improvement of the one we presented in [18] for the optimization of countermeasures. The essential difference with the previous model lies in the modelling of the recovery time after disruption, which in the present situation has to be much more accurate. Notably, as we mention in Section 5, the data our model requires is collected anyhow during a BCP.