This is Scott Waterhouse's TypePad Profile.
Join TypePad and start following Scott Waterhouse's activity
Scott Waterhouse
Interests: music, travel, motorcycles, cycling, recovery, and movies. backup, and gadgets.
Recent Activity
Mr. Appliance;
Data integrity on the Data Domain Archiver is provided by the same exceptionally sound architecture as a regular Data Domain appliance: the Data Invulnerability Architecture. All data is protected in multiple ways, including RAID and data at rest consistency checking, is self-healing, does not propogate errors, and is protected by multiple levels of hashing. The data integrity features of the Data Domain architecture are amongst the strongest and most secure of any storage system of any kind.
Data Domain Archiving
Today is a big day. Today something different happened—it is not all just about bigger, better, faster (although there is some of that too, just see my other post!). Today is the day that EMC is introducing the industry's first disk based long term retention system for backup and archive data: ...
Tina; the deduplication ratios should be roughly the same between the two methods. Certainly there is nothing so significant that you would ever choose one over the other for this reason.
VMware Backup with Avamar and vSphere
Edit: I am taking the unusual step of adding this after the original post was finished. As of the moment, it may be best to regard the conclusions in the post as tentative only. The post has generated a flurry of questions and controversy, with several people saying that I am wrong. Unfortunatel...
Scott Waterhouse is now following The Typepad Team
Mar 15, 2010
Daniel;
The situation you speak of may well be the case--and this is an ideal use case for target deduplication. Avamar may still be appropriate but there are a host of issues to consider.
As an interesting aside, most database backups with deduplication default to a fixed block deduplication of 8 kb, because that is how the size of a database field in most cases anyway. So it turns out to be more efficient to do this. On the other hand, we still achieve similar net deduplication ratios to the variable length dedup that I discussed above (in part due to how well databases compress, and assuming that we are talking about a database with an average change rate).
Variable Block and Fixed Block Deduplication
I saw this little quote on Techtarget today: "File-level deduplication will save a relatively small amount of space on your disk/tape archive. Block-level deduplication will save more space on your disk/tape archive, and variable block-level deduplication will save even more space on your disk/t...
Peter;
Well, I am not sure how you could have a missing file but not even know where it is from?
Having said that, I can search for it in Avamar (assuming I know the name... if you don't know that and you don't know where it is from how do you know it is missing?).
With Avamar it would take about 2 seconds to recover--there is no penalty for an "incremental" because there is no such thing as an incremental with Avamar.
It is possible, and no, conceptually, there is no difference (just the seek/load time that makes tape so awful to deal with in the first place).
VMware Backup with Avamar and vSphere Part 2
On the previous post I had a chance to review the major backup and recovery options for VMware. I put the discussion in terms of Avamar, but a lot of that is just for convenience. If you want, you can substitute your favorite backup application in place of Avamar, and you will get more or less t...
Paul;
This is on my (medium sized) to do list for the blog. I will do my best to cover this in the next few weeks.
Scott
Avamar v5.0
Today EMC formally announced the availability of Avamar 5.0, which is loaded with new features and functionality. I am going to briefly review some of them here, focusing on the changes that I think will be most significant from a customer's point of view. Having said that, please don't hesitate...
Steve;
I agree with everything you said. However, my (very) unscientific survey says: backup admins like guest level, VMware admins like image level. And as a backup guy I often have to articulate to VMware admins why image level has some issues.
VMware Backup with Avamar and vSphere
Edit: I am taking the unusual step of adding this after the original post was finished. As of the moment, it may be best to regard the conclusions in the post as tentative only. The post has generated a flurry of questions and controversy, with several people saying that I am wrong. Unfortunatel...
Madhav;
It is an excellent question. A couple other smart people have asked me this as well, and I haven't posted up anything yet because I don't have a definitive answer yet. There seems to be a discrepancy between what I get from some sources vs. others. I am actively pursuing this and will post something definitive when I think I have that definitive answer. Stay tuned!
VMware Backup with Avamar and vSphere Part 2
On the previous post I had a chance to review the major backup and recovery options for VMware. I put the discussion in terms of Avamar, but a lot of that is just for convenience. If you want, you can substitute your favorite backup application in place of Avamar, and you will get more or less t...
You can find me at scott dot waterhouse at gmail dot com or at my linkedin profile: http://ca.linkedin.com/in/sjwaterhouse Send me a note at gmail if you want my EMC address (or you can figure it out easily if you know our standard format of first name underscore last name at emc dot com).
With that kind of change rate your options are limited, unfortunately. It would be interesting to explore hosting a DD box at your DR provider site to see if that would be any less expensive.
Avamar 101
I have been getting lots and lots of questions about Avamar lately. And there seems to be lots of confusion about what Avamar is, what it does, where it fits, and so on. And because I have to wait 5 days to talk about all the really cool stuff that I have been alluding too with my countdown, I t...
I would definitely advise going through a new sizing exercise. Bear in mind that the new Avamar nodes are of a different physical size, and you want to work with your sizer to ensure they are sizing based on the older 2 TB nodes.
As far as databases go, it depends on what you consider large! ;)
Generically, anything under 1 TB or so is fine (with a possible exception of Domino servers which seem to generate exceptionally high change rates). Anything between 1-2 TB should be carefully considered. What is the change rate? What is the tolerance of the host to a backup process? Can you run a proxy server? Is it a VM or a physical system? Anything above 2 TB may be OK, but would almost certainly require a proxy. Another huge generalization: you are probably only going to do this if this is the last thing you have that you want to put on Avamar--i.e. doing this means you can turn off your traditional backup.
You said you had NetWorker, so your other strategy might be to run NW + Data Domain systems for databases and high change rate large size datasets, and Avamar for the remainder (remote, smaller, VMware, etc.).
If you have other questions just post them up, and if they seem common to me I will address them in a separate post.
Avamar 101
I have been getting lots and lots of questions about Avamar lately. And there seems to be lots of confusion about what Avamar is, what it does, where it fits, and so on. And because I have to wait 5 days to talk about all the really cool stuff that I have been alluding too with my countdown, I t...
Frank;
Sizing can be a bit of an art. EMC has a tool that can accurately size an environment, and I would advise you try to get your Avamar provider or EMC to use it for your environment. To size accurate you need to account for commonality across platforms, change rate, retention times, amount of source data, and so on.
You can get an estimate by using a dedup calculator (like the one I link too) but that doesnt take into account commonality across Avamar clients, and doesnt size for a grid... The sizing tool really is the best way.
Assuming it is not grossly more than you require, I usually recommend starting with a DS5 or DS6 (5 or 6 node grid) as upgrades from those configurations follow an easier, less disruptive path than upgrades from single/dual node configurations.
Avamar 101
I have been getting lots and lots of questions about Avamar lately. And there seems to be lots of confusion about what Avamar is, what it does, where it fits, and so on. And because I have to wait 5 days to talk about all the really cool stuff that I have been alluding too with my countdown, I t...
Jesper;
I haven't seen any good numbers for a few years now. For what its worth, the consensus seems to be: Symantec at 45-50% (including NBU and BE); EMC at 15-20% with NW and Avamar; IBM at 15% with TSM; others at 15-25%. Yes those are pretty big ranges, but trying to be more accurate than that doesn't seem valid to me without data to back it up. Annecdotally, I think CA, CommVault, HP DP, probably have about 3-5% each. That leaves about 5% of the market for the current niche players, like Veeam.
Avamar v5.0
Today EMC formally announced the availability of Avamar 5.0, which is loaded with new features and functionality. I am going to briefly review some of them here, focusing on the changes that I think will be most significant from a customer's point of view. Having said that, please don't hesitate...
Agreed totally. In fact as I considered the issues when writing this, and the absence of independent/objective tools, standards, and metrics for evaluating risk and cost of data loss and recovery, whether the insurance industry had soemthing that could be ported or translated to be useful for us in backup. And would be intelligible for a person of average mathematical ability (here in Canada actuaries are very highly trained in mathematics--well beyond my level!)
Do You Need Backup?
Do you need backup? And yes that is a serious question. (Given the name of this blog, and my chosen specialization for the last 20 years, it is understandable if you think I was asking it in jest!) A recent real world situation makes me wonder. It also makes me wonder at what point something bec...
Paul;
Please don't take offense! Veeam may be a truly fantastic product. But tracking everybody in this space is well nigh impossible, and Veeam appears to have ~ 1% market share. Just hadn't hit my personal radar yet.
But honestly, I wish you folks all the success. There is no malice in my comments. It seems like you have a great team, and happy customers, and that is great to see.
And by the way EMC NetWorker Fast Start has been a huge success for us, and is focused on the 1-20 server (backup client) segment. EMC has solutions for everybody from home/small office remote backup (Mozy), small business (NetWorker Fast Start), cloud backup, as well as the number one source and target deduplication solutions.
Avamar v5.0
Today EMC formally announced the availability of Avamar 5.0, which is loaded with new features and functionality. I am going to briefly review some of them here, focusing on the changes that I think will be most significant from a customer's point of view. Having said that, please don't hesitate...
Veeam;
Welcome to the conversation. In the spirit of being welcoming, I have posted your comment, although in the future it would be nice if comments felt less like advertising/spam. :)
Having said that, my claim was "first major backup product." With appropriate respect, I am not sure Veeam falls into the major category! This is the first I have heard of your product. (Not that I am saying it is bad... and thanks for the comment because we can now follow your link and form some opinions.) But by market share, Veeam seems to be a non-entity?
Sorry if that seems unfair, but by market share, there are a limited number of contenders: Symantec (NBU and BE), EMC (Networker/Avamar), IBM (TSM), and running fairly distant to these, HP (DP), CommVault, and CA (ArcServe).
Avamar v5.0
Today EMC formally announced the availability of Avamar 5.0, which is loaded with new features and functionality. I am going to briefly review some of them here, focusing on the changes that I think will be most significant from a customer's point of view. Having said that, please don't hesitate...
Full points for Fx. Dead Vlei on the eastern edge of the Namib, near Sossusvlei. Spectacular place to visit--not easy to get to--but unforgettable.
The Backup Blog Returns
Just a quick note to say that I am back, and once again obsessing about all things backup. Here is a small photo of some of the scenery I saw on my trip. Bonus points for anybody that knows where this is! I will say one thing: when the landscape looks like this, you know you are a long way from ...
Curtis... "It's the system that makes it a backup..." Isn't that pretty much what I said? :)
A copy is a necessary but not sufficient component of a backup system.
Copies and Backups Revisited
Mark (aka Storagezilla) Twomey wrote in his blog last week about copies and backups, and concluded that point in time copies are backups. Hrmm. As I have said before, I am not sure that I agree with this. I have discussed the issue here with W. Curtis Preston too, and I think there is a disconn...
Tim;
The auto media management point is tangential. Sorry. At least it ensures that new virtual media are added to the pool automatically.
Ideally one script would do it all...!
Relabeling Tapes in NetWorker
Just like with TSM, with NetWorker it is necessary to relabel virtual tapes in a VTL with deduplication in order for the device to reclaim the associated capacity. Without doing this, the capacity will not be reclaimed until the tape is re-used (which, if you have a large scratch pool, might tak...
Chuck;
I like the idea of a bar code on the vApp a lot. Now extend the idea to data structures: you could have a data protection bar code (that describes the data protection policy to be applied to the container); I spoke about that here: http://thebackupblog.typepad.com/thebackupblog/2009/06/a-data-protection-taxonomy.html But why not also have a bar code for data replication and availability?
Why make these bar codes objects with inheritable characteristics? Why not make some of the service providers that can act upon the policies contained in the bar code a part of vSphere?
There is a lot of mileage in this approach, in my opinion.
Why I Really Like VMware's SpringSource Acquisition
Sorry I wasn't able to weigh in when this happened last week. The good news: the more I think about it, the more I find this particular acquisition absolutely fascinating. The usual disclaimer: I'm offering my own opinions, and not speaking officially on behalf of VMware, EMC, SpringSource or ...
While that is true, what we have done is lowered our marginal cost of backup, we have not achieved economies of scale.
What I mean is that say I have 50 TB to back up. I might do this with tape for $300k in initial infrastructure. With deduplication, I might get by with $300k in initial infrastructure too (albeit with much higher performance and service levels than the tape solution).
But assume I haven't sized the solution for growth.
With tape, if I add a TB of data to my source, I now need to buy another tape drive. And another one for every 5 (?) TB after the initial 50.
With disk, if I add another TB of data, same thing.
For every TB I add to my source, I have to add capacity to my target in a linear fashion. That fashion might be .5x, but it is not like it becomes .2x at 100 TB.
If anything, I see a jump in costs again as I need to acquire an additional robot or additional dedup head.
So we are winning the battle--reducing costs--but losing the war because those costs continue to maintain a linear relationship with the cost/capacity of the source data.
Backup Sucks: Reason #38
"Any customer can have a car painted any color that he wants so long as it is black." And that was the type of thinking that let Henry Ford achieve enormous economies of scale and sell a lot of cars. It is also the sort of thinking that will drive the adoption of cloud infrastructure, software,...
Thanks Stephen.
I have made a modification so that the poll now reads "business data". I know there is a lot of other amibiguity, and I am going to leave things that way (including the archive stuff--my intuition is that cloud archive is going to happen easier and faster than cloud backup, btw).
Unfortunately I lost all the votes when I did that--anybody who has already voted please feel free to do so again.
New Poll: Cloud Backup?
I have added a poll in the right-hand column: would you back up business data to the cloud? The assumption here is: the cloud in this case is a public cloud or a service provider cloud--not your own private cloud. I haven't qualified this further with any questions about the size of organization...
I would add that my anecdotal evidence shows DD folks are pretty excited to join too. The few folks that I have talked to have indicated they are very excited to be joining EMC, and very much looking forward to the next few months. I will admit that my sample size is small, so take this for what it is worth, but I have only positives to report from DD employees, no negatives so far.
As far as selling less rather than more--well, EMC has been doing it successfully for years. Archiving is all about efficiency. Even at the most superficial level, Centera is cheaper than DMX/V-Max. EDLs with deduplication are less expensive than storage without. We offer dedup on Celerra. I have never had a hint of resistance internally on any of these strategies.
Scott
Data Domain, NetApp And The IT Industry
As I think about this latest acquisition, there are three major themes worth exploring. The first theme has been covered widely already -- the impact of data deduplication, why it's hot, the value of differing approaches, why it needs to go everywhere in the stack, etc. No need to cover that...
Thanks for providing the references. I should have made it clear that you were citing 3rd party data to substantiate your claims, not just making stuff up as you went!
I think we can both agree the numbers are a little out of data, and in the case of the value of lost data, perhaps a little suspect.
And we can definitely agree that tape alone is not the most cost effective, reliable or secure way to protect data.
The Cost of Backup: Soft and Hard Costs
Continuing a theme from the previous post, I want to briefly discuss the notion of hard and soft costs in a business case or Total Cost of Ownership (TCO) study. Hard costs are those costs associated with hardware, software, maintenance, operational expenses, capital costs, and so on. They are ...
EMC continues to have an important and substantial business relationship with Quantum. We will also continue to sell and support the Quantum based DL systems as per standard EMC policy. (Meaning that you will still be able to get EMC service and support--they will not be immediately end of lifed or anything like that!)
Welcome Aboard Data Domain
As of this morning, it is official, EMC has acquired a majority intrest in Data Domain, and from here forward they will be run as an product division within EMC, with the DD structure remaining intact, and Frank Slootman will head the new division. So first things first: welcome aboard to all Da...
Preston;
The first two sets of numbers are from Double Take (for tape and their software). There is not much I can say in defense of them, as their white paper does a poor job of outlining the assumptions used to generate them.
As I commented in the previous post to Curtis Preston, it is possible to imagine a "new" Avamar system for $5k in some circumstances.
As for the tape being way out--maybe. I think the math comes to 40 tapes per site, with a robot, and a server at each site too. That is only $18,000 per year per site. That doesn't strike me as wildly unrealistic.
The Cost of Backup: Soft and Hard Costs
Continuing a theme from the previous post, I want to briefly discuss the notion of hard and soft costs in a business case or Total Cost of Ownership (TCO) study. Hard costs are those costs associated with hardware, software, maintenance, operational expenses, capital costs, and so on. They are ...
More...
Subscribe to Scott Waterhouse’s Recent Activity