Security pages‎ > ‎

Redundant Facility Facts

The best way to handle an unavoidable disaster is by anticipating potential problems and putting into place measures to speed recovery. One way to decrease recovery time is through redundancy. Redundancy is providing duplicate or multiple components. Although redundancy is typically considered only for systems, redundant facilities can enable you to recover in the event of a catastrophic loss.

For obvious reasons, providing redundant sites can quickly become an expensive proposition. The following table lists the types of redundancy solutions with associated expenses:

Type Description
Mirror The mirror site has instant fail-over, provides for parallel processing, and is immediately available in the event of a disaster. This facility is necessary when an organization cannot tolerate any downtime. A mirror site is:
  • Fully configured with infrastructure (power, A/C, etc.), network systems, telephone connectivity, and internet connectivity in place.
  • Fully configured with functional servers and clients that are up-to-date mirroring the production system.
  • Expensive to maintain. It requires constant maintenance of the hardware, software, data, and applications and presents a security risk.
Hot The hot site is a redundant facility that is immediately available requiring only a few hours to activate. This facility is necessary when an organization can only tolerate a short period of downtime. A hot site is:
  • Fully configured with infrastructure (power, A/C and so on) ready to be powered up.
  • Fully configured with functional servers and clients that need only backups to be up and running.
  • Expensive to maintain. It requires constant maintenance of the hardware, software, data, and applications and presents a security risk.

A rolling hot site is a mobile facility, typically the back of an 18-wheel truck. It has all of the capabilities of a hot site and is very versatile, but expensive.

Warm The warm site is a redundant facility that takes a few days to a few weeks to activate. This facility may be adequate when an organization's maximum tolerable downtime (MTD) is a short time period, like a couple days. A warm site is:
  • Fully configured with infrastructure (power, A/C and so on) ready to be powered.
  • Equipped with servers and clients but the applications may not be installed or configured.
  • Equipped with communications links and other data elements that commonly take a long time to order and install.
  • Considerably cheaper than a hot site. It consumes less administrative and maintenance resources than mirror or hot sites.
Cold The cold site takes a few weeks to a few months to activate. A cold site is:
  • The least ready of the three site types, but it is probably the most common.
  • Ready for equipment to be brought in (there is no hardware on site).
  • Equipped with hookups for electrical power, HVAC, telephone, and Internet.
  • Low-cost and may be better than nothing.
  • The least expensive of the redundant sites.

A cold site can be a prefabricated building such as those used by school districts. This type of building is transportable and relatively inexpensive.

Mutual aid (or reciprocal) agreement A mutual aid agreement is an arrangement with another company that may have similar computing needs. In a mutual aid agreement:
  • Both parties agree to support each other in the case of a disruptive event.
  • Both parties operate under the assumption that each organization will have the capacity to support the other's operations system in the time of need. Unfortunately, this is a big assumption which is usually wrong.
Service bureau A service bureau is a contracted site that provides all alternate backup processing services.
  • It provides quick response and availability.
  • Testing may be possible.
  • The major disadvantages are the expense and resource contention during a large emergency.
  • It is also common for the service provider to oversell its processing capabilities.

Important facts about redundant facilities are:

  • Locate redundant sites at least 25 miles from the primary site. This will help prevent both facilities from being destroyed by the same disaster.
  • Acquire free space before the disaster. During a disaster, free space may be difficult to find and therefore going for a premium price.
  • Keep systems and information at the redundant site up-to-date. Change control processes should include compatibility between sites.
  • Ensure that contracts for redundant sites specify your requirements for the site and details for your taking possession of it.

The following are recovery terms you should be familiar with.

  • Service Level Agreement (SLA) refers to contracts with other internal groups or outside contractors to guarantee a specific service and turnaround time for that service. A common service level agreement is the repair or replacement of a server or some other critical system component in a short timeframe. A higher price is generally paid for the shorter term of availability. An SLA should define, with sufficient detail, any penalties incurred if the level of service is not maintained. In the information security realm, it is also vital that the provider's role in disaster recovery operations and continuity planning is clearly defined. Industry standard templates are frequently used as a starting point for SLA design, but must be tailored to the specific project or relationship to be effective.
  • The Mean Time Between Failure (MTBF) identifies the average time between failures and gives an indication as to how often the system is expected to fail. The more elements in the system typically means the shorter the MTBF.
  • The Mean Time to Repair (MTR) is an indication of how long typically it would take to get this system back online.
  • Maximum Tolerable Downtime (MTD) identifies the length of time an organization can survive with a specified service, asset, or process down.  
Comments