The "3-2-1" backup rule

Jul 08, 2020 · 9 mins read
Share this

It is considered that the backup rule “3-2-1” was first described in Peter Krogh’s book «The DAM Book: Digital Asset Management for Photographers».

And it does not surprise us as photographers depend on their private archive a lot, so they should choose the most reliable backup strategy. The “3-2-1” rule determines the following minimum for providing secure data storage:

THREE backup copies,

which are stored in TWO different physical formats; moreover, ONE of the copies should be stored outside the office. This scheme is based on supporting of fault tolerance through redundancy of data storage.

“Three different copies”

Means “three copies stored in one place”. And two different folders located on the one physical disk are considered to be located in one place. We won’t touch mathematic aspects deeply, but if you increase the number of copies, the probability of failure increases linearly, and the reliability of storage — according to a power function. Here we assume that physical characteristics of storage devices are identical, and the probability of damage to these devices is statistically independent. In other words, when you do three copies instead of one, you get trebling of probability of failure of this set of copies, but you also get cubic increase of reliability. In real life it makes data, stored in three copies, practically unbreakable, although you should change faulty disks more often as collectively their number has risen.

Unfortunately, correlation of the probabilities devices’ damage is rather high in real life. For example, an electromagnetic pulse appears when in a supply circuit of office, it equally influences all disks directly. And it’s more likely that if one disk fails, two others will fail, too. It will happen owing to the similar nature of impact of pulse on the standard serially issued disks, which impose identical requirements to the quality of power supply. Why do you need three copies not two? Because in the real life threats to two copies are correlated owing to the logical organization of backup procedure. In example, let’s take RAID1. If the virus infects the file on one disk of an array, at once there is an infection of the second copy on a mirror disk. Also if you switched on replication, replica would be damaged by virus, too. Even if you just perform a daily full backup, it will also be infected if the administrator will not notice the infection of source data during the work day.

Summing up all written above we can say that two copies will not be enough to restore the information in different situations when time of detection and administrator’s reaction in case of original data’s damage exceeds the period between adjacent tasks of copying/replication/mirroring data.

In order to get bigger threats’ correlation it is recommended to record data in two different formats at least. For example, if you store information on DVD (optical data record), it won’t suffer from the electromagnetic pulse. And after ODD’s breaking DVD still stores your data.

Other examples of correlated threats are increasing of the temperature due to the failed air conditioner in the server room or the fire at office which, certainly, absolutely homogeneously influences all copies which are stored in office.

So, storage of copies in different formats is aimed to the reducing of probability of simultaneous loss of all copies because of the homogeneous influence.

“one of the copies should be stored outside the office”

Genuinely, the third rule – “one of the copies should be stored outside the office” – solves the same issue as the second one, but it is realized through geographical distribution of storage. Theft or the fire in the office will lead to the loss of all copies stored there, while the fire or theft at one office won’t lead to the fire or theft at other geographically isolated office. It makes threats at different offices statistically independent.

And what about the cloud storage?

Can we consider it as an alternative for backup? Obviously, we cannot. It could be used in order to store data or backups, it could be used as the outside-office storage, but you should always remember that data from the cloud could be easily lost as well as from the other places.

However, one of the pros of cloud hosting is easing of the backup. Administrator needn’t to buy any SAN. Volume of cloud storage depends only on financial possibilities. Also, the system often works with the volume clear for the customer, so you can hardly lost some volume suddenly as it can happen with SAN.

In fact, backup cloud storage is an alternative to a streamer as data from it is extracted with a certain delay comparing with a local disk storage. This delay depends on the channel width and provider’s rate because, generally, the lower the rate costs, the slower the speed of data extraction is.

Should you always follow the “3-2-1” rule?

Our answer is “no”. On the one hand, everything depends on the cost of your data and the cost of potential damage. On the other hand, the possibility of data damage plays the huge role, too.

Remember the main rule: no secure system should exceed the cost of the securable object. So, if your data is not as valuable as the whole “3-2-1” backup system or threats are unlikely, you can realize this scheme partly. The most important things are:

  • to make a list of all possible threats and estimate their probability and criticality;
  • to divide all threatens into two groups – “deactualized by these steps” or “recognized as irrelevant for the company’s business plan”.

After these steps you will be able to estimate which parts of “3-2-1” scheme you need and how much it will cost.

What is the best way to organize a consistent backup for distributed applications consisting of multiple servers?

It is the same as for distributed servers. The main thing is to inform the application about the backing up. And you should understand that only transactional applications and file systems needs consistent backup.

Realisation of 3-2-1 backup rule is expensive, isn’t it?

As we have said, everything depends on the value of your data and on that how critical it is to lose your data. For example, the possibility to lose photo archive for professional photographer justifies the investments in backup system. Not in vain, that the photographer (Peter Korghe) was the first to describe 3-2-1 rule.

Are the streamers dying?

Gartner Company’s research “Organizations Leverage Hybrid Backup and Recovery” showed that big companies use combined backup strategy, which includes streamers as well as disks. It allows to reduce RTO and RPO (to speed the process and to reduce the time of backing up) in the annex to the most actual data and critical information. Such strategy also leads to the reducing of the cost of backup infrastructure possession in case of long-time storage of information, which is required by legislative requirements, for example.

According to statistics, 80% of all recovery operations are done for storing data having age of archiving equal to 4 weeks or less. Therefore, it is reasonable to store backups made within the last 4 weeks on the fast-available disk storages despite the fact that they are more expensive than streamers if we count the size of costs on one information storage unit.

Despite the reducing the unit cost of storage on disk drives, streamers are still leaders on this parameter. Streamers perfectly combine safety of storage, mobility of the data medium, and low unity cost of storage on disk drives. The major competitor of streamers is cloud storage optimized for backup storing, for example, Amazon Glacier.

Is data encryption supported by Veeam?

Veeam encrypts data sent via the Internet to the cloud, but it does not support encryption of backups on a disk or a streamer.

Do Appliance, Windows, and Linux support the Veeam? How does backup and recover Veeam’s server itself?

Veeam can be installed on the Windows only. It can be real or virtual server. Speaking about backup of Veeam’s server itself we should say that it does not contain any vital backup information but only infrastructure preferences. For example, the number and schedule of the tasks, the number of repositories, streamers, and so on. All these preferences are stored in a SQL database and in the file of preferences.

If you store SQL on the separate virtual machine (as we always recommend), you can backup it by Veeam. File of preferences is always stored in Veeam’s default repository, but you are able to set another direction. At the same time, Veeam backups are all-sufficient. All recovery information is stored in backup itself. So, you can recover backup even without Veeam server.

How can we recover infrastructure if failure “touched” backup infrastructure (Veeam servers) itself?

Here we have two variants.

First variant (if we did Veeam backups):

Recover virtual machine from SQL which stores Veeam database. You can do it even from backup itself by special utility;

Install new Veeam and set already recovered SQL as database;

Import the file of Veeam’s preferences.

Second variant (if we did not do Veeam backups):

  • Install new Veeam server;
  • Set up the infrastructure;
  • Import old backups;
  • VMs could be recovered, but you have to reset all preferences and schedule.

What is the optimal number of restore points?

There are 14 restore points by default. But we do not insist on the fact that it is the optimal number. Everything depends on the politic, procedure, and the regimen of storage defined in your company.

How does Veeam work with the program request from scripts and user application to a backup infrastructure?

All Veeam functions available from GUI are also officially available as PowerShell’s cmdlets.

Can I write or generate pre-backup scripts?

  • There are Pre-Freeze/Post-Thaw scripts inside the virtual machine, and Post-Backup scripts outside the VM. Also, you can run tasks by PowerShell, executed the necessary actions previously.

Does Veeam support DAG in Microsoft Exchange 2010/2013?

  • Yes, it does.


Best VPN
Join Newsletter
Get the latest post right in your inbox.