12-23-2013 12:30 AM - edited 12-23-2013 12:57 AM
I am updating our disaster recovery plan and for that I am doing some test restores using EADR. We use the method of creating a bootable ISO. It works fine, so far I was able to restore every single machine with it. However, the restore of a 20GB machine (15GB OS, 5GB data) takes about 45 minutes. During the actual restore it uses about 85% of a Gbit connection all 45 minutes. That's obvious from the backup machine itself as well as on the client I am doing te restore on. 45 minutes at 850Mbps is about 230GB in total. That's over 10 times as much data as needed. The EADR log screen shows it ran at 5.99 MB/sec.
The data comes from a file library on which the drives have a concurrency of 4. Unless I absolutely didn't get it at all that means that theoretically (if both 4 sources were writing at equal speed) my 20GB of data is mingled in about 80GB of actual 'virtual tape space'. Usually the backup host filters the data and only sends the actual required data. Maybe for EADR that works abit different.
In short, why does my 20GB EADR restore use 230GB of network traffic?
12-23-2013 07:17 AM
EADR is, essentially,. a Microsort product that Data Protector makes use of
I checked our lab cases and Knowledgebase to see if I could find any clue, and I found nothing that relates to this
I would like to get some clarification about this statement "... the restore of a 20GB machine (15GB OS, 5GB data) takes about 45 minutes....". I assume that this means that the actual EADR part is done and you are actually trying to restore data.
When doing a backup to a tape device, the default concurrency is 4, meaning that 4 data streams are written to the device and data is interleaved. Restores are done with a default concurrency of 1, which cannot be changed, so, theoretically, a restore could take 4 times the time it took to do the backup. Again, theoretically, the best way to ensure maximum restore performance is to backup with a concurrency of 1
Most people, to ensure tape drive streaming, will set concurrency quite a bit higher, up to a concurrency of 12-15, which, again, will slow down the restore
Since you are backing up to a Disk device, and tape drive streaming is not an issue, you should be setting the Backup concurrency to 1, whether you have a file library, or have the destination configured as a Virtual Library System (VLS). This should speed up your restores
12-23-2013 08:30 AM - edited 12-23-2013 08:34 AM
Yeah I know the configuration of my file library is far from ideal, but the issue is I am severely low on IOPS on my backup storage. The result is if a phase 2 job (disk to tape) is kicked off while another library drive is still writing, the tapedrive won't get into streaming anymore because my file library is not delivering. Therefore I have only three virtual tapedrives and mainly use only one for both reading and writing, that way the running job always has the full 'power' of my file library. The only jobs where that really matters are the ones writing to tape. We have only a small company with limited storage requirements and even less money in these days to invest. Therefore I want to stick to this method for now.
However, I still think somethings wrong. If I do a default restore of all data in the machine to a temp-location, it finishes in a few minutes, only sending the actual data over the network. Even if the 20GB of data I need are stacked inside 200+GB of other data, which is very unlikely in the first place, I'd expect to see only the 20GB on my network, while the rest is 'dropped' in my backup server locally.
In addition I also have a B2D (softwarebased) store, which doesn't do concurrency anyway and it behaves the same, also a huge amount of network data, a factor 10+ compared to the actual data that is to be restored.