Disk to Disk technology
The backup and disaster recovery industry today is experiencing a renaissance with the advent of Disk to Disk backup technology. The arrival of Disk to Disk based backup technology has enabled a complete rethinking of the backup process and has provided the foundation for new and exciting features to make an administrator’s life much easier.
One bright example of this renaissance is the arrival of Data Deduplication technology. Deduplication is getting a lot of press and attention lately and rightly so as deduplication is one of the biggest industry changing features to date.
Benefits of Data Deduplication
In a nutshell, data deduplication is the process of identifying duplicate copies of data, recording where that data is stored, and finally storing a single copy of that data. This process can result in large space and cost savings for customers.
The typical space savings depends a lot on the data that the data deduplication agent is processing. For example, large encrypted files with little duplicate data will not benefit much from the data deduplication process. However, a team of developers working on the same code, or a team of graphic designers working on the same project generally will have a lot of duplicate data.
A process such as Backup and Archiving which generally has a lot of redundant data (how many copies of Windows or Microsoft Office does your organization have?) can benefit immensely from data deduplication. Some vendors claim up to a factor of 300x for space savings, but a factor of 20x is more realistic.
As mentioned earlier, the data deduplication is a new feature and technology born from the paradigm switch from using tape as a backup medium to using disk as a backup medium. Unlike disk, tape is a sequential or non random-access medium; data can only be read or written in sequence.
This sequential characteristic of tape means that generally, it takes longer to access specific files. Disk to Disk backup enables immediate access to specific files without having to read through the preceding files on the disk. This greatly speeds up the process of both backup and recovery and can save valuable time. It also enables the system to be more efficient, backup the data and move on to the next task.
Client Side Deduplication
In this scenario, the agent residing on the client handles the deduplication process. The client is the ultimate authority of what data resides on it, and what data is changed. Especially for remote offices where network bandwidth is at a premium, being able to deduplicate the data prior to sending it to the backup appliance at the main office saves valuable bandwidth. The down side to this is that the processing of the data consumes client processor time. Depending on the circumstances, the trade off for network bandwidth could be well worth it.
In-Band Deduplication
For appliance based products that use the in-band approach, data is deduped before the data is actually written to the disk. This process has the advantage that the data is only touched once. However, the in-band approach adds increased overhead to the actual backup process, and can slow down the process, which is not ideal.
Out-of-Band Deduplication
For appliance based products that use the out-of-band approach, backup data is first written to disk in-line during the backup process. After the backup is finished, the data is then processed and duplicated data is discarded. Since the data is not processed in-line, there is no overhead penalty during the backup. The trade-off here is that extra storage is necessary while the backup data is being post processed, but there is the assurance that the backup data is captured as quickly as possible.
Best Approach
In our view, the biggest advantage of disk-to-disk technology and data deduplication is being able to combine all three approaches to leverage the strengths of each, while mitigating their weaknesses. For example, Continuous Data Protection (CDP) is a Client Side Approach that makes a lot of sense. A CDP based client knows which data has been modified, and can keep track of it on the fly. Newly modified data is rarely a duplicate and can be confidently sent to the backup server appliance. When the back end server is coordinated with smart clients, duplicated data can be identified on the fly; greatly reducing overall network congestion while minimizing client side processing.
Data deduplication is big win in any backup environment; as is replacing tape with disks as an archival medium. We recommend you consider incorporating Disk to Disk and Data Deduplication technologies into your Backup processes.