A closer look at ‘data de-duplication’
One of the buzzwords this year could very well be a tool that eliminates duplicate data so businesses can optimize their backup storage capacity.
Dubbed de-duplication, the process focuses on data reduction with the aim to eliminate redundant data that uses up disk storage space and increase backup and capacity, Alvin Ow, senior director of systems engineering at Symantec Asia-Pacific and Japan, said in an e-mail interview.
Ow explained: “For example, an individual sends a PowerPoint file via e-mail to five recipients. Instead of five separate copies being sent outâââ¬Ã¦backed up and archived, only one copy of the PowerPoint [file] exists and only this single copy is backed up and archived.” He added that all five recipients would be given access to that one PowerPoint file, which is stored in a central database.
“This significantly reduces the amount of disk storage required and also reduces the resources required for backup and restore,” he added.
As application and user data continue to explode, data de-duplication will allow IT departments to “significantly reduce” the amount of physical storage and data each administrator has to manage, Ow noted.
“De-duplication has the potential of saving [businesses] huge IT dollars in recouped storage space, [while] ongoing de-duping can reduce backend media requirements significantly–without sacrificing data protection,” he said. “These savings [will] result in freed up funds which companies can then reinvest in other areas of the business.”
Not a new concept in Asia
According to Ow, this technology is not new to the Asia-Pacific region. “At Symantec, we have been incorporating this technology into our Enterprise Vault product which is part of our Information Foundation solution, for many years,” Ow said.
“Enterprise Vault checks for redundancies in both e-mail and file storage before archival using single-instance store,” he said. “This ensures that data required for business and regulatory purposes is always available.”
“We also have similar technology embedded in our NetBackup PureDisk product, [for instance], to address the requirements of remote office data protection.” Before data is streamed remotely to the main data centre for backup, the NetBackup agent does a comparison of all new data residing on the remote site and existing data in the data centre, Ow explained. He added that only new changes and updates are sent over to the data center, according the block size defined by the administrator. “This eliminates the need for tape devices in remote offices,” he said.
“De-duplication software holds promise and is definitely one of the more intriguing technologies emerging in the storage space,” noted Ow. “While data de-duplication has been available for the past year, it has further potential to radically change the way we store and retain information.”
According to Lim Beng Lay, Asean regional product manager at HDS, while the technology has been around for two years, it did not take off due to concerns over data integrity. However, Lim said in a phone interview, the evolving nature of today’s business requirements are now driving enterprises to take a closer look at data de-duplication.
“Today, customers are looking at historical data for [deployment in] customer relationship management (CRM) [and] data warehousing,” he said. “Not only [do] they need the historical dataâââ¬Ã¦they need it fast.”
The changing business climate and application of historical customer data have led to a tremendous change in the amount of data generated, Lim noted.
“Ten years ago, to recall historical data, everything probably fits into a floppy disk,” he said. “But today, to recall historical data, you may need two truckloads of [storage] tape.”
Lim added that the business environment is changing and becoming costlier to manage. This, he said, is probably the reason why enterprise customers are beginning to take notice of de-duplication.