Worst Day
I learned a hard lesson when I first started as a DBA.  Although to be honest, it’s not a lesson I should have had to learn (I’ll keep you in suspense).  I work for a large Fortune 100 company and as with many companies our size, there are many processes in IT.

I was in charge of a database for an instance of Microsoft Operations Manager 2000.  If you have ever supported that database you already feel my pain, but trust me there are much worse out there.  I digress.  I needed a backup plan and with MOM backing up the database is not enough.  You also have to backup your management packs if you changed anything from the default.  In addition, any custom management packs would also have  to be backed up.  I wrote a script to export and copy those to another server daily.  I’m very glad I had this in place because it saved my bacon in the end.

So what about the database?  We had a piece of backup software that the server folks put on all servers to take care of backups.  It’s been long enough now (~12 years) that I don’t even remember what the software was at the time.  They asked if we had databases on the server to make sure they got them backed up and assured me that the entire OS drive would be backed up as well.  All was well with the world.  We went through the typical deployment phases of procuring the hardware, getting it racked, getting it connected and configured on the network, and installing the OS.  Once the platform was there and ready to go we began working on the middleware pieces.  I won’t get into the MOM component architecture, but we did have two separate servers one each for the MOM Management database and the MOM Reporting database.

Part of the middleware piece is making sure everything third party is installed and working correctly.  The backup software was obviously one of those.  It installed without issue and the backup folks got successful backups.  Off we went to the races and the deployment went smoothly.

Fast forward about a year and I get an alert that the MOM Management database went into suspect mode.  Not something I wanted to see, especially as a new DBA and being SQL 2000.  I did some searching on the internet and at the time the results were sparse.  I attempted a DBCC CHECKDB REPAIR without luck and even resorted to a DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS which also got me no where.  Keep in mind that this is a MOM alerting system, so losing some data was not a big deal and would have been faster than a restore.  Otherwise, I would not have entertained that option at this point.

So what was next?  Restore the most recent backup of course!  We called the backup guys and told them to restore the most recent backup.  They responded and said, “What backup?”  Very funny guys, just let me know when it’s done.  No, we don’t have a backup of that at all.  In fact, we have no record of ever backing it up.

That will make any day a bad day.  It’s also why I keep emails, like the one where they confirmed successful backups.  I’m now completely up the creek without a paddle.  I had to rebuild the entire MOM environment.  It took me all night, but I managed to rebuild everything.  I was so thankful that I had scripted out exports of all the management packs or there was no way I could have done all that in one night.

Ever since then I have always done my own backups whether there is a backup team doing it as well or not.  I tend to back things up to my DR server and I often setup a round robin scenario where server 1 backs up to server 2 and server 2 to  server 3 and server 3 back to server 1.

A tough lesson in trust to learn, but it has not and will not ever happen to me again!5worst_1_300x250_NEW

3 thoughts on “A Bad Day with Backups

  1. Yeah, Ryan tough lesson to learn. I had a job at a small startup in 2002 where I was a software developer but also their DBA, on SQL Server 2000. I’d never been a DBA before, but they gave me that position anyways. I bought a DBA book that mentioned the highest priority thing a DBA would do was to backup and restore databases. I thus, skipped directly to that chapter that taught me how to created backups and then implemented them in production. Within a week or two, we actually needed one of the SQL Server backups. I had also been told that the entire server was being “backed up”. The problem with that is that the SQL Server data file was considered an “open” file, so it was never being backed up by the server backup. We were really fortunate that we had just starting doing the SQL Server backups.

    At another place I also ran into the “we’re backing up the database” scenario and found it out not to be true. Fortunately, I dug deep enough to find this out before anyone actually needed the database restored. In this case it’s a little more scary because I wasn’t the DBA. The DBA seemed to be taking the server guys at their word on the “we’re backing up the db” statement. I was a BI developer that was a primary IT user of the data.

  2. Guy’s, the one thing I tell everyone is that its not the backup that’s the problem. I don’t know anyone who ever had trouble because they weren’t taking a backup. The problem is always the restore. So the first thing I always do when I’m told, yes we back that up, is get a restore happening somewhere. Then you can feel comfortable that in a crisis you can et something back. I get the whole backup chain checked to make sure all is working, and also now recommend getting the tapes checked. I had a great one where I’d checked that all existing backups could restore, fortunately we had an off line database for user queries so it was as simple as checking the restore there daily. Anyway we needed some data from four or five months previously for a billing enquiry. Sure enough the tape from that month was corrupt. OK some data loss was acceptable, let’s go back another month, then the quarterlies, then the half yearly, then the anual, then the latest monthlies. All of the tapes were corrupted.

  3. That’s right. You’re only as good as your last restore! That is one large side benefit to log shipping. Redgate’s SQL BackupPro is also a good tool to automate both the backup and restore. Excellent for pre-staging cut overs for mirroring or availability groups.

Comments are closed.