NDR (Near Disaster Recovery) & RPO-RTO

Once you understand DC, DR and WAN links then it shouldn’t be much of a problem to understand the next concept called NDR (Near Disaster Recovery or Near DR). Near DR is also a location like DR (by the way DR is also called as Far DR as its located in a different city than DC), the fundamental difference between the DR and a NDR is the distance from DC, the DR would be located in a different seismic zone than DC maybe more than 100 plus Km’s away but the NDR has to be hosted within a distance of 30 Km’s, now the question comes why such a short distance, the answer is latency, the latency between the DC and the NDR should be less than 1 MS (milli second) (Latency is measured in the amount of time required for a network packet to reach a destination and get a response back)

Ok so NDR is something which is hosted within 30 kms and has very low latency from DC but what is the objective of having a NDR at such low latencies and what is the difference between a DR and an NDR

A NDR is used as a bunker site which means it does not host any applications i.e. there are no application servers hosting any application in NDR as it happens in DR; instead an NDR only hosts database copies, these database copies are replicated in near Realtime from the primary databases in DC to the NDR so that in case some catastrophe happens in DC you have minimal loss of data

Ok so I have to admit that I have taken here for granted that you guys know what an application and a database is and what is application hosting, don’t worry these topics would be surely coming up in the later sections, for now just remember that Application is an instance where the actual users/customers connect to access the application, it’s a front end, but the data which is actually presented in the application or received from the application is actually stored in another instance called a database, so NDR is more about protecting data which is there in the databases

NDR is expensive and hosts only data which is most critical for the organization such as a Core Banking database of a Bank or Core insurance database of an Insurance company so that there is nearly zero loss of critical data if the DC goes down, to understand how this really happens let’s look at some more concepts

DC, DR and NDR connectivity’s

RPO/RTO

Here I would like to introduce 2 concepts called RPO and RTO

RPO (Recovery point objective): In simple words it defines how much data has been lost when the disaster happened, for e.g. if you take a scenario of DC and DR and there is a lag of 30 mins from the replication happening between the DC and DR i.e. the data in DR is always 30 minutes behind DC and waiting for replication then it means that the RPO in this case is 30 Mins; so if a disaster happens in DC you would have 30 minutes of data loss and have to start the business without that data which was inserted in DC for the last 30 minutes

RTO (Recovery Time Objective) This is simpler to understand than RPO, its just the time taken for the system to come up in DR, which means if a Disaster happens in DC then you must switch the services in DR from secondary to primary and point all customers and users to the DR site for application access and much more, all these processes take time until the actual application comes up in DR site, this time taken for the application to come up in the DR is called RTO

To reiterate just remember both RPO and RTO are measured in time, RPO is the data lost in a specific amount of time and RTO is the time taken for the applications to come up in DR

Phew; coming back to NDR, the data is replicated on faster links probably a Dark-fiber (unused dedicated links where no other customer traffic is passing through) link between DC and NDR and since they are in the same region there is very less latency (less than 1 millisecond) and the RPO is only in seconds so the NDR data is almost in Realtime sync with the DC data

Now if you imagine any disaster happening at the DC datacenter then you have the Realtime data synced in NDR which would then be replicated to the DR on a normal 5 MS latency which can have a lag of around 15 to 30 minutes or so but with almost zero data loss from DC; so now you may understand why NDR doesn’t host anything its main purpose is to get the DC data in Realtime so that no data is lost and in case of a disaster at DC pass on that Realtime data to the DR so that minimal data loss happens and the RPO is reduced drastically

An important point I would like to highlight here is that for the company's core databases or tier 1 data the regulatory bodies like RBI and IRDA impose certain standards of maximum limits for RPO / RTO and having only a DC and DR would not be able to meet them, hence NDR comes as a Savior where you can meet those standards having minimal data loss and faster recoveries.

Page updated

Google Sites

Report abuse