The number one problem that I discuss with clients is backup and recovery with this initiated by both of us in equal quantities. It constantly surprises me that businesses of all sizes have inadequate backup processes for their business systems. Many businesses, whilst having implemented a backup strategy, fail to monitor and test it often resulting in them finding out their backup strategy doesn’t perform adequately when they need it most. In this article I will be covering data protection through backup and recovery to meet clear business, rather than technically, defined objectives.
Data protection means different things to different people and this only helps add to the confusion for customers when talking to IT advisers about data backup and recovery. Data is a key and often critical part of most modern businesses and as a result businesses need to adequately protect their data but be aware that there is no panacea to cover every scenario so what works for one situation may not work for another. Having said that, for most business situations it should be a simple process to implement backups, monitoring and testing so that the data can be recovered when required.
First of all, I will try to simplify the reasons why backups should be carried out. A backup is carried out in business terms to allow the recovery of data and associated application services to allow the business to continue functioning under two clear conditions:
Data Corruption is caused by programs or users modifying or deleting files so the data contained within them is not as expected. It should be noted that this can include malware such as Cryptolockers which are programs that are unintentionally run by users that then corrupt files by encrypting them. The purpose of the backup is to allow the simple recovery of individual files to a given point in time that meets a business defined recovery point objective.
Data Loss is typically seen by clients as being able to recover a complete computer system after a failure or theft of the computer. However, to IT practitioners this also means data corruption. Data corruption, however, may require some specific additional planning to allow the recovery of the data due to the volume and complexity of the recovery process whilst needing to meet a business defined Recovery Time Objective.
I have used two key business defined objectives that will have a major impact on the strategy chosen for the backup and recovery of data. The Recovery Point Objective (RPO) defines how much data a business can afford to lose and creates the driver for the backup frequency. The Recovery Time Objective (RTO) is the level of downtime that is permitted before the system should be recovered to allow the business to function. The RPO and RTO provide business defined requirements on how the backup and recovery processes should perform. When I speak with clients about their target RTO and RPO they would typically suggest that these should both be zero, whilst this is achievable it does come with a high price to meet this Service Level Agreement (SLA). The conversation then changes direction to try and understand what the actual risks facing the data are and how these can be appropriately reduced, how much data loss and downtime can be supported and also how big the budget is to protect the resultant risks.
Most businesses operate with data and application services that will demand different SLAs and as such this can heavily impact the data protection solution design. I suggest that businesses with many or complex data systems create a simple data dictionary with each data system they operate mapped out with its own RTO and RPO. When all the data RPO and RTO are defined, the business can then design their backup solution as there are a clear set of business requirements to be met. The backup solution design should allow for the all the computer and software architectures used by the business. This could possibly require multiple strategies and levels of complexity to meet the business goals but this should not add complexity to the solution for the management and recovery of the protected data. I will cover more about the strategies in a future article on backups with how Cloud and on premise backup solutions can be used in addition to the implications of using virtualised server environments and how the micro business can look to deliver simple and effective backup strategies.
Now that you have a backup strategy designed to meet your own RTO and RPO, you will monitor that the backup it is running as expected won’t you? I would suggest that more than 60% of infrastructures that I have looked at are not having their backups monitored to the point that days can go by without a successful backup being carried out. In many cases, it is found that the implemented backup processes were not protecting the business data and systems to meet even basic business requirements and, in the worst case, certain failures would have meant complete and total loss of the business data with no recovery point available for the data. I would suggest that if you are responsible for the business systems being protected that you receive backup status notifications, regardless of the backup being managed by an internal or external IT resource. Whilst there may be a SLA in place that might offer you some compensation for a failure to recover data, it is your staff and customers that will be affected by the inability to restore data so I would suggest that you keep a close eye on your backups.
A critical part of the backup process is checking that the backups are working properly and this means restoring the data! I would suggest that there is no test for a restore other than actually carrying out a full restore and that this should be done regularly. Many clients believe that the ‘Verify after backup’ and a ‘Successful’ state proves the backup works, in some situations and on some media this may be acceptable but I would always suggest that a level of testing that proves that the RPO and RTO can be met from any backup solution is carried out. I did some work with a client a few years ago as part of their disaster recovery testing whereby we were trying to test recovery onto foreign hardware as part of a simulated flood event. Whilst the backup media was recoverable on their own tape drives, they were unable to restore data on the large array of tape drives that were available to them as part of their workplace recovery provider services. The client changed their backup hardware and strategy following this failure and then repeated the test so that they could demonstrate that they could recover onto foreign hardware to meet the RTO and RPO. In a perfect situation you should test the recovery of the data and services onto foreign hardware as under some scenarios your own hardware will not be available. I recognise that it is not always possible for clients to complete a total recovery onto new hardware and so suitable processes need to be implemented to test given scenarios with these situations being chosen to prove that the backup strategy mitigates given risks.
Hang on a second, ‘I use the Cloud so we do not need to do backups’ or ‘my service provider does my backup so we need not worry as they have it covered’. These two common statements raise their head more frequently than you might think. The response is usually ‘You may be right, but I would still suggest that you define your own requirements and then check to see if your current agreement meets these requirements as your business needs may have changed since you signed up and the agreement may not have been reviewed. When was the last time that you checked that your backup provider was able to recover your data services?’ This may seem obvious but how many times have you made small incremental changes to your IT systems and assumed that your backup strategy is still sound? I will cover these two topics in a future piece but needless to say, check your contracts if you outsource application or support services with a third party.