Avoiding storage system failures

There are many ways to reduce or eliminate the impact of storage system failures. You may not be able to prevent a disaster from happening, but you may be able to minimize the disruption of service to your clients.

There are many ways to add redundancy to primary storage systems. Some of the options can be quite costly and only large business organizations can afford the investment. These options include duplicate storage systems or identical servers, known as ‘mirror sites’. Additionally, elaborate backup processes or file-system ‘snapshots’ that always have a checkpoint to restore to, provide another level of data protection.

Experience has shown there are usually multiple or rolling failures that happen when an organization has a data disaster. Therefore, to rely on just one restoration protocol is shortsighted. A successful storage organization will have multiple layers of restoration pathways.

We has heard thousands of IT horror stories of initial storage failures turning into complete data calamities. In an effort to bring back a system, some choices can permanently corrupt the data. Here are several risk mitigation policies that storage administrators can adopt that will help minimize data loss when a disaster happens:

Offline storage system: Avoid forcing an array or drive back on-line. There is usually a valid reason for a controller card to disable a drive or array, forcing an array back on-line may expose the volume to file system corruption.

Rebuilding a failed drive: When rebuilding a single failed drive, it is import to allow the controller card to finish the process. If a second drive fails or go off-line during this process, stop and get professional data recovery services involved. During a rebuild, replacing a second failed drive will change the data on the other drives.

Storage system architecture: Plan the storage system’s configuration carefully. We have seen many cases with multiple configurations used on a single storage array. For example, three RAID 5 arrays (each holding six drives) are striped in a RAID 0 configuration and then spanned. Keep a simple storage configuration and document each aspect of it.

During an outage: If the problem escalates up to the OEM technical support, always ask “Is the data integrity at risk?” or, “Will this damage my data in any way?” If the technician says that there may be a risk to the data, stop and get professional data recovery services involved.

Read More

Test: How Secure Is Your Data?

With the increasing reliance on today’s computer systems and networks for the day to day running of businesses, there is an imminent threat to business continuity. Computer systems can be affected by a variety of sources: power outages, water leaks, systems failures, etc. Most companies have some sort of backup system in place, example UPS for power failure, but fail to take into account other hidden factors. It is no longer a question of if you will experience system or environment failures, but when. The 10-question quiz that follows can assist in assessing your company’s risk of experiencing downtime due to system or environment failures.

1. How many hours of continual data processing does your business do over a 24 hour period?
Threat: The average company’s hourly downtime accounts for $78,000 in lost revenue?
8 hours or less (10 points)
8 to 16 hours (75 points)
16 to 24 hours (100 points)

2. How much downtime can your business afford?
Threat: Computer downtime cost US businesses $4 billion a year, primarily through lost revenue.
1 week to 1 month (10 points)
2 days to 1 week (75 points)
1 day or less (100 points)

3. What is your business system or data worth?
Threat: 43% of U. S. Business never re-open after a disaster experience and 29% close with in 2 years.
$10,000 or less (10 points)
$10,000 to 100,000 (75 points)
$100,000 or more (100 points)

4. How many users does your computer system support?
Threat: The manufacturing industry lost an average of $421,000 per incident of on-line computer systems downtime.
1 to 10 users (10 points)
10 to 100 users (75 points)
100 or more users (100 points)

5. How much down time have you experienced over the last year?
Threat: The average company’s computer system was down 9 times per year for an average of 4 hours each time.
20 hours or less (10 points)
20 to 150 hours (75 points)
150 or more hours (100 points)

6. How many hours is your data center unattended?
Threat: The average company’s hourly downtime costs an average of $330,000 per outage.
1 hour or less (10 points)
1 hour to 8 hours (75 points)
8 hours or more (100 points)

7. What percentage of your systems and environmental conditions (temperature, water, and smoke) are you monitoring with an early detection system?
Threat: Environmental incident’s accounted for 10. 3% of business interruptions in the past 5 years.
90% or more (10 points)
70 to 90% (75 points)
70% or less (100 points)

8. How many hours has your UPS had to back up your system this year?
Threat: Power problems accounted for 29. 48% of U.S. computer outages.
3 or less hours (10 points)
3 to 8 hours (75 points)
8 or more hours (100 points)

9. If your system went down on Friday at midnight, how long would it be before you are notified?
Threat: A 1993 Gallup/GRN survey reported that Fortune 1000 companies average 1.6 hours of LAN downtime per week [that is over 2 weeks per year].
3 or less hours (10 points)
3 to 8 hours (75 points)
8 or more hours (100 points)

10. How many people have access to your main computer room ?
Threat: Human error accounted for 34. 4% of business interruptions in the past 5 years
3 or less (10 points)
3 to 10 (75 points)
10 or more (100 points)

Scoring :

165 and under: Your computer room is either very well protected or computer room down time will not affect your business.
165-799: You have trouble spots in your computer room; proactive steps taken now will help you avoid trouble in the future.
800 and over: Your computer room and quite possibly your job are in serious jeopardy. Look into ways of securing your computer room before disaster strike’s time is ticking.

Read More

Why did data loss?

Physical damage

A wide variety of failures can cause physical damage to storage media. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer any of several mechanical failures, such as head crashes and failed motors; and tapes can simply break. Physical damage always causes at least some data loss, and in many cases the logical structures of the file system are damaged as well. This causes logical damage that must be dealt with before any files can be recovered.

Most physical damage cannot be repaired by end users. For example, opening a hard disk in a normal environment can allow dust to settle on the surface, causing further damage to the platters. Furthermore, end users generally do not have the hardware or technical expertise required to make these sorts of repairs; therefore, data recovery companies are consulted. These firms use Class 100 clean room facilities to protect the media while repairs are made, and tools such as magnetometers to manually read the bits off failed magnetic media. The extracted raw bits can be used to reconstruct a disk image, which can then be mounted to have its logical damage repaired. Once that is complete, the files can be extracted from the image.

Logical damage

Far more common than physical damage is logical damage to a file system. Logical damage is primarily caused by power outages that prevent file system structures from being completely written to the storage medium, but problems with hardware (especially RAID controllers) and drivers, as well as system crashes, can have the same effect. The result is that the file system is left in an inconsistent state. This can cause a variety of problems, such as strange behavior (e.g., infinitely recursion directories, drives reporting negative amounts of free space), system crashes, or an actual loss of data. Various programs exist to correct these inconsistencies, and most operating systems come with at least a rudimentary repair tool for their native file systems. Linux, for instance, comes with the fsck utility, and Microsoft Windows provides chkdsk. Third-party utilities are also available, and some can produce superior results by recovering data even when the disk cannot be recognized by the operating system’s repair utility.

Two main techniques are used by these repair programs. The first, consistency checking, involves scanning the logical structure of the disk and checking to make sure that it is consistent with its specification. For instance, in most file systems, a directory must have at least two entries: a dot (.) entry that points to itself, and a dot-dot (..) entry that points to its parent. A file system repair program can read each directory and make sure that these entries exist and point to the correct directories. If they do not, an error message can be printed and the problem corrected. Both chkdsk and fsck work in this fashion. This strategy suffers from a major problem, however; if the file system is sufficiently damaged, the consistency check can fail completely. In this case, the repair program may crash trying to deal with the mangled input, or it may not recognize the drive as having a valid file system at all.

The second technique for file system repair is to assume very little about the state of the file system to be analyzed and to, using any hints that any undamaged file system structures might provide, rebuild the file system from scratch. This strategy involves scanning the entire drive and making note of all file system structures and possible file boundaries, then trying to match what was located to the specifications of a working file system. Some third-party programs use this technique, which is notably slower than consistency checking. It can, however, recover data even when the logical structures are almost completely destroyed. This technique generally does not repair the underlying file system, but merely allows for data to be extracted from it to another storage device.

While most logical damage can be either repaired or worked around using these two techniques, data recovery software can never guarantee that no data loss will occur. For instance, in the FAT file system, when two files claim to share the same allocation unit (“cross-linked”), data loss for one of the files is essentially guaranteed.

The increased use of journaling file systems, such as NTFS 5.0, ext3, and xfs, is likely to reduce the incidence of logical damage. These file systems can always be “rolled back” to a consistent state, which means that the only data likely to be lost is what was in the drive’s cache at the time of the system failure. However, regular system maintenance should still include the use of a consistency checker in case the file system software has an error that may cause data corruption. Also, in certain situations even the journaling file systems can not guarantee consistency. For instance, if the physical media disk used delays the writing back of data or reorders it in ways invisible to the file system (for instance, some disks lie about the changes being flushed to the disk, saying they have been flushed when they actually haven’t) a power loess may cause such errors to occur (note that this is usually not a problem if the delay/reordering is done by the file system software’s own caching mechanisms). The solution is to use hardware that doesn’t report data as written until it actually is written or using disk controllers equipped with a battery backup so that the waiting data can be written when power is restored. Alternatively, the entire system can be equipped with a battery backup (UPS) that may make it possible to keep the system on in such situations, or at least give some time to have it shut down properly.

And BACKUP YOUR DATA is a good way to protect data.

But backup technology and practices have failed to adequately protect data. Most computer users rely on backups and redundant storage technologies as their safety net in the event of data loss. For many users, these backups and storage strategies work as planned. Others, however, are not so lucky. Many people back up their data, only to find their backups useless in that crucial moment when they need to restore from them. These systems are designed for and rely upon a combination of technology and human intervention for success. For example, backup systems assume that the hardware is in working order. They assume that the user has the time and the technical expertise necessary to perform the backup properly. They also assume that the backup tape or CD-RW is in working order, and that the backup software is not corrupted. In reality, hardware can fail. Tapes and CD-RW do not always work properly. Backup software can become corrupted. Users accidentally back up corrupted or incorrect information. Backups are not infallible and should not be relied upon absolutely.

Read More

Reasons and Costs of Data Loss

Computer data may be one of your company’s most valuable and vulnerable assets. According to our experience, the primary threats to your data include:

  • Hardware or System Problems
  • Human Error
  • Software Corruption or Program Problems
  • Computer Viruses
  • Natural Disasters

These five major threats to your computer data share two things in common: they are unpredictable and, in many cases, uncontrollable. Therefore, the precautions taken by IT professionals to safeguard company data cannot always prevent a data loss.

Computer users and many experts often consider lost data permanently destroyed, with no hope of recovery. Information about lost data can be complex, inconsistent or inaccurate, so it’s not surprising that data loss and data recovery are some of the most confusing and misunderstood concepts.

In addition to being a vulnerable asset, computer data is also a valuable asset.

Based on the information below it is easy to see how significant the costs of lost or inaccessible data can be. The following is a summary of the average hourly impact of lost data on a selection of different businesses.

Type of Business & Average Hourly Impact

Costs Of Data Loss

When time is crucial and data is mission-critical, data recovery may be the most practical option available. Data recovery professionals recover data from the damaged media itself, providing several advantages over alternative methods of data retrieval.

1) Complete – Data recovery professionals can safely enter the system or media to achieve a comprehensive data recovery.

2) Current – Although many people revert to backups following a data loss, those backups typically contain outdated information or could be corrupt themselves. Data recovery can help you access the most recent version of the lost data.

3) Fast – Every second that passes following a data disaster means time and money lost to your company. Data recovery reduces this downtime by quickly recovering and returning your data.

4) Cost-effective – The expense in time, money, and effort of rebuilding or re-keying lost data can be overwhelming to your company. Data recovery can provide the quickest and most complete data recovery possible.

Read More

Avoide the Backup Tape Graveyard?

Many businesses could find that when in need of a data recovery, their data is not retrievable because it is stored on old tape formats.

Many organizations have electronic data dating back decades and the chances of it being retrieved or rendered inaccessible over time are fairly high. Furthermore, retrieving data that is stored on out-of-date tapes can be costly and may require special equipment.

It is important for a company to look at its past as well as its future. Information that may need to be accessed must be transferred to modern media formats in order to be compliant with current legislation and recoverable in the event of data loss. By maintaining up-to-date records and data on modern media formats, extraction can be quick and painless. Furthermore, storage costs will decrease and the organization will be better aligned with compliance regulations.

In addition to ensuring backups are stored on modern media formats, the following tips may help minimize the chance of backup failure/data disaster.

1.Verify your backups. Backups, regardless of age, are known to fail or not work. This often goes undiscovered until they are needed.

2.Store backup tapes off site. This will ensure your files are preserved if your site experiences a fire, flood or other disaster.

3.Create a safe “home” for your backup tapes. Keep backup tapes stored in a stable environment, without extreme temperatures, humidity or electromagnetism.

4.Track the “expiration date.” Backup tapes are typically rated to be used from 5,000 to 500,000 times, depending on the type of tape. Tape backup software typically will keep track of the tapes, regardless of the rotation system.

5.Maintain your equipment. Clean your tape backup drive periodically, following directions in its manual regarding frequency. Most businesses just send the drive back to the manufacturer when it begins to have problems, but if a drive has problems, so can the backup tapes.

Read More

Lost Data: to recovery or not recovery

data lossA data loss has occurred – now what? Determining the need to recover lost data can be a difficult one. There are several things to take into consideration when determining if data recovery is required.

Backup, Backup, Backup
Everyone knows the importance of a good backup system, so your first step should be to determine if the data is actually backed up. Many times lost data is stored on a backup tape, backup hard drive, on the network or other various locations throughout an organization.

Unfortunately, locating and reloading the lost information can be time consuming and deplete resources. If a backup is located, it is important to check that the most recent copy of the data is available. Many times backups occur on a set schedule and if modifications to the data were saved after the backup occurred that information will not be accessible.

Re-Creation
Another important option to consider is if the data can or should be re-created. Two items to take into account when considering this option include the type of data lost and the amount lost:

  • Type of Data – Different data may have different perceived value. Recovering a customer database is (probably) more important than recovering a file containing possible names for a pet goldfish. Is the missing data a high-volume transaction database such as a banking record? This would be nearly impossible to recreate the thousands of transactions that were happening in real time. Other types of data may not be able to be re-created such as digital photos. Understanding the type of data that was lost is imperative to determining your next steps.
  • Amount of Data – Understanding how much data was lost can help you understand how much time and resources would be required to re-create the data. The more data lost, the more time and resources required to re-create it – if re-creation is even possible.

An additional point to consider is that with strict regulatory and legal requirements, many companies need access to their lost data in order to comply with these requirements. Accessibility to data and the legal requirements surrounding that data are essential to understand when considering if data recovery is necessary or not.

Data recovery costs can be difficult to plan for because they are unexpected. No one wants to lose data just like no one wants their car to break down or to have to call a plumber for a broken pipe. However, to help put it into perspective with other business related costs – vending services and that morning cup of coffee can run between $500 and $1000 every month for a small business office. An average recovery fee for a typical desktop, Windows-based system is around $1,000. Comparing those figures – the true value of data recovery becomes clear

Read More

Protecting Data from Severe Weather

You can protect your data by following some simple precautions. With that said, even the most well-protected hard drives can crash, fail, quit, click, die… you get the picture. So here are a few tips for how to respond when extreme weather does damage your computer equipment.

1. Summer heat can be a significant problem as overheating can lead to drive failures can result. Keep your computer in a cool, dry area to prevent overheating.

2. Make sure your servers have adequate air conditioning. Increases in computer processor speed have resulted in more power requirements, which in turn require better cooling – especially important during the summer months.

3. To prevent damage caused by lightning strikes, install a surge protector between the power source and the computer’s power cable to handle any power spikes or surges.

4. Invest in some form of Uninterruptible Power Supply (UPS), which uses batteries to keep computers running during power outages. UPS systems also help manage an orderly shutdown of the computer – unexpected shutdowns from power surge problems can cause data loss.

5. Check protection devices regularly: At least once a year you should inspect your power protection devices to make sure that they are functioning properly.

Responding to Data Loss Caused by Severe Weather

1. Do not attempt to operate visibly damaged computers or hard drives.

2. Do not shake, disassemble or attempt to clean any hard drive or server that has been damaged – improper handling can make recovery operations more difficult which can lead to valuable information being lost.

3. Never attempt to dry water-damaged media by opening it or exposing it to heat – such as that from a hairdryer. In fact, keeping a water-damaged drive damp can improve your chances for recovery.

4. Do not use data recovery software to attempt recovery on a physically damaged hard drive. Data recovery software is only designed for use on a drive that is fully functioning mechanically.

5. Choose a professional data recovery company to help you.

Read More

Data Recovery Vendor Considerations

When looking for a data recovery provider, it’s important to ensure that the one selected can handle not only the various types of media, but also understands the data security regulations of today’s organizations. For example, encrypted data requires special data handling processes — from the clean room to the technically-advanced recovery lab. This isolation ensures no one person has complete access to the media throughout the recovery process, thereby providing security while maintaining recovery continuity and quality.

Additionally, it is important to note that some data recovery companies have been cleared for security projects and services for U.S. government agencies. As a result, these companies implement data privacy controls that are based on the U.S. government’s Electronic Defense Security Services requirements for civilian companies that are under contract for security clearance projects or services.

Unfortunately, most data loss victims only consider data recovery right after they have experienced a data loss and are scrambling for a solution. Emotions run high at this point. The fallout from a data disaster and corresponding data loss is sometimes crippling, with the IT staff working around the clock to get the computer systems back to normal. These distressed circumstances are not the time to think about what makes a good data recovery vendor. Incorporating this important decision into your business continuity planning is best done in advance. Some key questions to ask as part of this proactive exercise include:

  • Do you have a relationship with a preferred data recovery vendor?
  • What should you look for when reviewing data recovery companies?
  • Do you include data recovery in your disaster and business continuity planning?
  • Do you have a plan for how to handle data loss of encrypted data?
  • Do appropriate people have access to the encryption keys to speed up the recovery process?

Sometimes planning for these procedures can become involved and tedious, especially if you are planning for something you have never experienced. Do some investigating by calling data recovery service companies and presenting data loss situations such as email server recoveries, or RAID storage recoveries or physically damaged hard disk drives from mobile users. Ask about data protection and the policies in place to protect your company’s files.

Additionally, find out the techniques and recovery tools the providers use. Ask the companies how large their software development staff is. Inquire about how they handle custom development for unique data files. For example, will they be able to repair or rebuild your user’s unique files? Does the data recovery service company have any patents or special OEM certifications?

While these details may not seem important at first, they can be the decisive factors that determine whether your data recovery experience is a positive and successful endeavor.

Following is a checklist of factors to consider when searching for a data recovery vendor for encrypted data or ensuring your data recovery partner is able to comply with your data security policies:

  • Solid Reputation – Experienced data Recovery Company with a strong background.
  • Customer Service – Dedicated and knowledgeable staff.
  • Secure Protocols – Expert knowledge of encryption products with privacy protocols in place.
  • Technical Expertise – Capable of recovering from virtually all operating systems and types of storage devices.
  • Scalable Volume Operations – Equipped with full-service labs and personnel that can handle all size jobs on any media type.
  • Research & Development – Invested in technology for superior recoveries; not just purchasing solutions.

It is important to understand that data loss can occur at any time on any scale. It’s especially crucial to be prepared with a plan that adheres to your company’s security policy. The more prepared one is, the better the chance for a quick and successful recovery when a problem arises.

Read More

Data Security & Data Loss

Encryption continues to be the topic on every CIO and IT person’s lips nowadays. No one wants to end up in the news as the next victim of a privacy breach or the next company that didn’t protect its customers’ information. If you conduct a news search using the words “personal data breach,” you’ll be alarmed at the number of instances where personal information such as social security and credit-card numbers have been exposed to possible theft. In a recent breach, a state government site allowed access to hundreds of thousands of records, including names, addresses, social security numbers and documents with signatures.

Whether it’s government agencies, research facilities, banking institutions, credit card processing companies, hospitals–or your company’s computers – the risk of compromising private information is very high.  At the recent “CEO-CIO Symposium,” speaker Erik Phelps from the law firm Michael Best & Friedrich described the relationship business has with technology. In his presentation, he stated that since “business relies so heavily on technology today, business risk becomes technology dependent.” The possibility of litigation is part of business. It has always been a risk of doing business, but because technology and today’s business are so intertwined, business risk has a higher threat level. This has prompted many to encrypt workstations and mobile computers in order to protect critical business data.

If you have rolled out encryption, how do you maintain your IT service quality when the hard disk drive fails? How do you plan and prepare for a data loss when the user’s computer is encrypted?  These are all issues that should be considered when putting together a data disaster plan. In addition, data recovery, one of the more common missing elements of a disaster recovery plan, should also be factored in because it can serve as the “Hail Mary” attempt when all other options have been exhausted

Read More

Data Loss–From PCs to Suit Pockets

Data is everywhere. No longer confined to desktop computers, data is always with us – at the gym in the form of an iPod®, in the car via your cell phone, and of course surrounding you at work – notebooks, desktops, servers, etc. With the increased portability of data comes the increased risk for data to be lost, misplaced, damaged or destroyed.

How to protect mobile devices from data loss, here are some simple preventative steps that will help create good habits for the use of USB sticks and hopefully prevent any data disasters.

Minimize misplacement – Try to prevent ‘wandering’ USB sticks. The device is easily lost when you don’t exactly know where it is kept. A dedicated USB spot prevents loss of data from a portable storage device.
Carry with care – Make sure your USB is stored safely when traveling to minimize the risk of losing data.
No backups, please – A USB stick is too vulnerable to store precious information. These sticks should therefore never be used as a backup device.

Put a lid on it – if not in use ensure that the connector of your USB is protected. By using the protective cap, provided with any USB stick, a possible data disaster can be averted.

Unplug before you leave – Before you embark on a journey that requires a laptop and a USB stick, make sure the devices are separated. This way, both the laptop and the USB stick will run less risk of damage.

Read More