Preventive recovery action in hard disk drives

1. A method in a data processing system for minimizing read/write errors caused by impaired performance of a hard disk drive during runtime operation of said hard disk drive, said runtime operation including an active mode during which read/write operations are performed and a standby mode during which no read/write operation is underway, said method comprising the steps of: monitoring at least one performance parameter of a hard disk drive during said standby mode of operation; and in response to detecting a degraded value of said at least one performance parameter during said monitoring, performing preventive recovery action only during said standby mode of operation, wherein said preventive recovery action includes restoring said performance parameter to an acceptable value without interfering with hard disk drive operation during an active mode.

2. The method of claim 1 wherein said performance parameter is signal resolution, and wherein said step of performing preventive recovery action comprises the step of adjusting a fly height of a read/write head within said hard disk drive, such that said signal resolution is maintained at an acceptable level.

3. The method of claim 1, wherein said data processing system includes a disk drive controller associated with said disk drive, said method further comprising the steps of: during said step of monitoring at least one performance parameter, detecting a degradation of said performance parameter beyond a pre-determined value; and in response to detecting a degradation of said performance parameter, performing preventive recovery action during said standby mode, wherein said preventive recovery action instructs said disk drive controller to undertake corrective action to rectify the degraded performance parameter.

4. The method of claim 1, further comprising the steps of: detecting a read/write error during said active mode of operation, said error having a cause that is correlated to said performance parameter; and in response to detecting a read/write error during said active mode of operation, examining said performance parameter during said standby mode, such that said cause may be diagnosed and further read/write errors prevented.

5. The method of claim 4, further comprising the step of correlating said preventive recovery action to said cause of said read/write error, such that said cause may be corrected.

6. The method of claim 4, wherein said step of examining said at least one performance parameter is preceded by the steps of: initiating a data recovery procedure during said active mode; and upon completion of said data recovery procedure, initiating preventive recovery action during said standby mode, such that a subsequent read/write error may be prevented.

7. The method of claim 6, wherein the step of initiating preventive recovery action during said standby mode is followed by the steps of: determining whether said cause has been corrected by said preventive recovery action; in response to said cause having been corrected, continuing said runtime operation of said hard disk drive; and in response to said cause having not been corrected, utilizing predictive failure analysis to issue a warning, such that said hard disk drive may be taken off-line.

8. A system for preventing read/write failures within a hard disk drive during runtime operation of said hard disk drive, said runtime operation including an active mode during which read/write operations are performed and a standby mode during which no read/write operation is underway, said hard disk drive including a controller for providing electromechanical control of said hard disk drive, said system comprising: means within a disk controller for monitoring a performance parameter of said hard disk drive during said standby mode of operation; means responsive to a detected degradation of said performance parameter for producing an error signal indicative of a potential hard disk drive failure; and means responsive to receiving said error signal for initiating preventive recovery action only during a standby mode of operation, wherein said preventive recovery action includes restoring said performance parameter to an acceptable value without interfering with hard disk drive operation during an active mode.

9. The system of claim 8, wherein said means for monitoring a performance parameter of a hard disk drive and said means for producing an error signal in response to detection of a potential hard disk drive failure, are predictive failure analysis instruction means.

10. The system of claim 9, further comprising: a controller for providing electromechanical control of said hard disk drive, said controller receiving and executing said predictive failure analysis instructions.

11. The system of claim 9, wherein said means for initiating preventive recovery action only during a standby mode of operation are preventive recovery action instruction means included within said controller.

Read More

Basic Information of Hard Disk Drive (Part III)

Firmware ‘overlay’ code are specific code functions. Why not just put all firmware code into one section? Well, since the RAM in the drive is a limited resource, they’ve put some code into ‘overlay files’, so that this specific code can be swapped into RAM when that specific function is needed. When the function is not needed, it can be swapped out of ram and some other function can be swapped into it again.

The firmware update files from Maxtor (I think the same goes for the other vendors) are not scrambled/encrypted/packed in anyway. In fact, you can find the exact same code in these files also in the ‘*.RPM’ files that PC3K produces for example.

Maxtor distributes their firmware file in a so called “.DMC” file. This DMC file is a package of 4 files, a ‘.Bxx’ file, a ‘.cxx’ file, a ‘.bbr’ file and a ‘.cbr’ file. Like I mentioned, this DMC container is not packed or scrambled in anyway. You can just cut the files out of it. The first 0x150 bytes of this file is the header. This header contains the four filenames, the offsets at which bytes in the package these files can be found, the length of the files and a checksum (not 100% sure about the checksum though). The ‘.bxx’ file is the biggest file and contains the overlay modules. You can find all code overlay modules by looking for ‘MO’ in the file. Right after this 2 byte string, you’ll find the hexadecimal overlay module ID. The ‘.bbr’ file contains the main firmware code. The last 2 files are very small, not sure what they contain, probably some checksums for the firmware and overlay modules.

Like said, the firmware code and overlay modules can also be found in the ‘*.RPM’ files of course, since this represents the firmware code on disk. So, you can look through these RPM files and scan for the ‘MO’ string to find any specific overlay module.

So, in short, if a vendor has released a firmware upload tool (most vendors have), BUT haven’t released a firmware file for your specific drive type, you could create your firmware, if you have the dumped modules (for example, obtained from this site). You could rip the main code and overlay modules and paste them into an existing DMC package. However, since I don’t know the checksum calculation and the meaning of these .cxx and .cbr files (probably checksums), you’d have to do more research, but in theory, it would be possible to create your own firmware files and flash them with such standard Vendor program to disk, so you wouldn’t need to buy an expensive tool like PC3000 (at least not if your sole goal was to upload a new firmware).

Of course, you could also create your own flasher program, instead of using the one supplied by the vendor. However, since vendors use specific versions of the ‘download microcode’ ATA command, you’d have to do research into this.

Furthermore, you could create a program that does EVERYTHING that a tool like PC3000 does. However, like pointed out, you’ll need very detailed information on the vendor specific ATA commands and the structure of the SA for that specific drive type and since this info is not made public by anyone, this means a LOT of work. “But hey, the PC3000 tool features a special hardware PCI card!” Yes, but as you’ll understand by now, you can think of that card as nothing more than a copy protection. They could have perfectly created the tool without it, but I guess they would have sold quite some copies less So you really can’t blame them for it, in fact, I think it’s quite a smart move to stop piracy.

Read More

Basic Information of Hard Disk Drive (Part II)

If a drive has damaged data in the SA, for example in the firmware code module, it might become unusable. To repair these disks, the HDD can be switched to a so called ‘safe mode’, by setting specific jumpers on the drive. If the drive is operating in safe mode, it bypasses its own firmware. Instead, it wants the user to upload firmware to its ram. If the user uploads a correct ‘temporary’ firmware to RAM, it starts executing that firmware. If this uploaded RAM code (the ‘loader’) starts operating, the user can then start to issue ATA commands to the drive to modify the damaged modules.

Firmware files that you can find on a site like this contain a lot of files. First, there is the ‘loader’ file (*.LDR). This file is the ‘temporary’ firmware code, that’s being uploaded to the RAM (so, it’s not being written to disk). Then, there are a lot of ‘*.RPM’ files. These files represent the different modules, which can be written to the SA. The filenames consist of 8 numbers. The first 4 numbers specify the (hex) UBA and the second 4 numbers represent the hexadecimal module size in sectors (each sector normally contains 512 bytes, so for example, if a filename ends in 0002, then that module is 1024 bytes long). So, in short, after uploading the loader to RAM, the user can start replacing damaged modules by overwriting them with correct ones.

BTW, please note that the term ‘firmware’ for the packages on this site is not very well chosen, since these packages contain all needed modules to repair a HDD and not just the firmware (=code) module.

Anyway, if you’re looking for a specific firmware module, you can do 3 things:

1) rip the firmware modules from the SA of an identical HDD
2) get these modules from a friend (or for example, from the files section on this site: www.firmwarebase.com)
3) use a firmware update program from the vendor.

About this last option: firmware updates from vendors are pretty rare, since firmware code almost never needs to be replaced. However, Maxtor for example, had some problems with the firmware code on some Diamondmax HDD models. So, they issued a firmware update. This update consists of 2 files:

1) the executable file that issues the ATA ‘download microcode’ command to upload the firmware files to the HDD
2) The firmware code, consisting of the ‘main’ firmware code and ‘overlay’ code modules.

Read More

The basic knowledge about Hard Disk Drive

Firmware files that you can find on a site like this, contain a lot of files. First, there is the ‘loader’ file (*.LDR). This file is the ‘temporary’ firmware code, that’s being uploaded to the RAM (so, it’s not being written to disk). Then, there are a lot of ‘*.RPM’ files. These files represent the different modules, which can be written to the SA. The filenames consist of 8 numbers. The first 4 numbers specify the (hex) UBA and the second 4 numbers represent the hexadecimal module size in sectors (each sector normally contains 512 bytes, so for example, if a filename ends in 0002, then that module is 1024 bytes long). So, in short, after uploading the loader to RAM, the user can start replacing damaged modules by overwriting them with correct ones.BTW, please note that the term ‘firmware’ for the packages on this site is symantically not very well chosen, since these packages contain all needed modules to repair a HDD and not just the firmware (=code) module.
Anyway, if you’re looking for a specific firmware module, you can do 3 things:
1) rip the firmware modules from the SA of an identical HDD
2) get these modules from a friend (or for example, from the files section on this site)
3) use a firmware updater program from the vendor.

About this last option: firmware updates from vendors are pretty rare, since firmware code almost never needs to be replaced. However, Maxtor for example, had some problems with the firmware code on some Diamondmax HDD models. So, they issued a firmware update. This update consists of 2 files:

1) the executable file that issues the ATA ‘download microcode’ command to upload the firmware files to the HDD
2) The firmware code, consisting of the ‘main’ firmware code and ‘overlay’ code modules.

Firmware ‘overlay’ code are specific code functions. Why not just put all firmware code into one section ? Well, since the RAM in the drive is a limited resource, they’ve put some code into ‘overlay files’, so that this specific code can be swapped into RAM when that specific function is needed. When the fuction is not needed, it can be swapped out of ram and some other function can be swapped into it again.

The firmware update files from maxtor (I think the same goes for the other vendors) are not scrambled/encrypted/packed in anyway. In fact, you can find the exact same code in these files also in the ‘*.RPM’ files that PC3K produces for example.

Maxtor distributes their firmware file in a so called “.DMC” file. This DMC file is a package of 4 files, a ‘.Bxx’ file, a ‘.cxx’ file, a ‘.bbr’ file and a ‘.cbr’ file. Like I mentioned, this DMC container is not packed or scrambled in anyway. You can just cut the files out of it. The first 0x150 bytes of this file is the header. This header contains the four filenames, the offsets at which bytes in the package these files can be found, the length of the files and a checksum (not 100% sure about the checksum though). The ‘.bxx’ file is the biggest file and contains the overlay modules. You can find all code overlay modules by looking for ‘MO’ in the file. Right after this 2 byte string, you’ll find the hexadecimal overlay module ID. The ‘.bbr’ file contains the main firmware code. The last 2 files are very small, not sure what they contain, probably some checksums for the firmware and overlay modules.

Like said, the firmware code and overlay modules can also be found in the ‘*.RPM’ files of course, since this represents the firmware code on disk. So, you can look through these RPM files and scan for the ‘MO’ string to find any specific overlay module.

So, in short, if a vendor has released a firmware uploader tool (most vendors have), BUT haven’t released a firmware file for your specific drive type, you could create your firmware, if you have the dumped modules (for example, obtained from this site). You could rip the main code and overlay modules and paste them into an existing DMC package. However, since I don’t know the checksum calculation and the meaning of these .cxx and .cbr files (probably checksums), you’d have to do more research, but in theory, it would be possible to create your own firmware files and flash them with such standard Vendor program to disk, so you wouldn’t need to buy an expensive tool like PC3000 (at least not if your sole goal was to upload a new firmware).

Modern hard disks feature an area that contains information that the CPU on the HDD logic board uses to operate the drive. That area is called the “system area” SA. This area contains for example the drive ‘microcode’ (a.k.a. firmware), HDD Configuration Tables, Defect sector tables, SMART information, Security info (drive passwords etc), Disk ID info (serial nr etc) and more. These categories of information are called ‘modules’. So the SA contains a module for the firmware code, a module for the SMART info etc.The SA is stored on ‘negative cylinders’ of the HDD and therefore is not accessible by normal read commands. However, the area can be accessed with other ATA commands. An example of a (more or less) ‘standard’ ATA command that can access info on the SA is the ‘download microcode’ ATA command, which can be used to update information in the firmware code module. However, most of the commands that can be used to access the SA are vendor specific. Since vendors (obviously) don’t want users to mess around with the SA, these commands are generally not made public. However, these commands can be deduced by, for example, reverse engineering the firmware code itself.
This reverse engineering has been done and led to development of tools that can issue these (vendor specific) ATA commands and can read/write almost all sectors in the SA. One example of such tool is PC3000. A tool like this contains tables per HDD model, containing these vendor specific ATA commands and also tables with sector numbers on which the different modules are stored, also per HDD model. SA Sector numbers are counted in “UBA’s”. For example, one specific HDD might use UBA 4 to store the ‘DISK ID’ module, where another HDD model might use another sector for this module.
So in short, to create a tool that can read/write data in the SA, you need to:

A) know (and understand) the (vendor-) specific ATA commands that can be used to access this area and
B) know on which UBA sector the specific modules are stored.

If a drive has damaged data in the SA, for example in the firmware code module, it might become unusable. To repair these disks, the HDD can be switched to a so called ‘safe mode’, by setting specific jumpers on the drive. If the drive is operating in safe mode, it bypasses its own firmware. Instead, it wants the user to upload firmware to its ram. If the user uploads a correct ‘temporary’ firmware to RAM, it starts executing that firmware. If this uploaded RAM code (the ‘loader’) starts operating, the user can then start to issue ATA commands to the drive to modify the damaged modules.

Of course, you could also create your own flasher program, instead of using the one supplied by the vendor. However, since vendors use specific versions of the ‘download microcode’ ATA command, you’d have to do research into this.

Furthermore, you could create a program that does EVERYTHING that a tool like PC3000 does. However, like pointed out, you’ll need very detailed information on the vendor specific ATA commands and the structure of the SA for that specific drive type and since this info is not made public by anyone, this means a LOT of work. “But hey, the PC3000 tool features a special hardware PCI card!” Yes, but as you’ll understand by now, you can think of that card as nothing more than a copy protection. They could have perfectly created the tool without it, but I guess they would have sold quite some copies less. So you really can’t blame them for it, in fact, I think it’s quite a smart move to stop piracy.

So, in short, if you want to mess around with the SA, you have 2 options: invest a lot of time and energy into learning or simply empty your pockets and buy a tool like PC3000.

Read More

Modern Hard disk drive

Introduction
Brief architecture description, the main problems of modern hard disk drives, methods of HDD servicing and repair of simple malfunctions, SMART, passwords. The article is intended for data recovery specialists, technicians servicing computer equipment, network administrators and experienced users.

Drive construction
 A drive consists of a mechanical part – head-and-disk assembly (HDA) and a printed circuit board (PCB). HDA acts as a case for all mechanical parts of a hard drive and contains one more chip performing the functions of a preamplifier/commutator. A PCB consists of several chips which control the mechanical parts, encode/decode data on magnetic surfaces and transfer the data through an external interface. PCBs are located outside HDA, in its lower part as a rule. In some hard drives, like the well-known Seagate Barracuda series, the controller has an additional metal cover protecting the electronic components from damage.

Mechanics
 The whole construction is based on the drive case protecting sensitive mechanical parts from environmental influence. Inside it is filled with dust-free air though the air is not specifically purified; instead the assembly of the mechanical part is performed in a special workshop where air contains less than one hundred dust particles per cubic meter, i.e. in the so-called “class 100 clean room”.

 HDA case has an opening blocked by a tight air filter. It is used to align air pressure inside the HDD and outside. Unfortunately, if a drive falls into water, the latter penetrates the inner space through that opening.  Rotation of disks creates air flow circulating inside the case and constantly passes through one more filter separating dust if it somehow appears inside.

 Drive case accommodates a pack of magnetic disks driven by a spindle motor, magnetic heads with their positioning system and a preamplifier/commutator enhancing the signal from the heads and switching between them.

 A magnetic disk is a circular aluminum (rarely ceramic or made of special glass) plate with surface polished in accordance with the highest  precision class for the sole exception of the parking zone, if it is present. In fact, high precision of disk surfaces and the heads causes them to “stick” to each other because of molecular attraction forces. To prevent that effect, manufacturers use special laser serrations in the zone of contact between drive heads and disks.

 The disks demonstrate specific magnetic properties owing to their chrome oxide based coating (magnetically active substance) or cobalt layer applied using vacuum deposition. Such coating is characterized by high hardness and much greater wear resistance compared to previous models coated with a layer of soft varnish based on ferric oxides which could be easily damaged unlike modern coatings.

 The disks are rotated by a special 3-phase electric motor. The stationary part contains three windings connected according to the “star” scheme, with a tap in the middle, and the rotating part is a permanent sectional magnet made of rare-earth metals. The requirement of beat reduction and high rotational speed values force the manufacturers to use special bearings in the spindle motor; these can be either ball bearings or improved fluid bearings (using special oil dampening impact loads and thus increasing motor durability). Fluid bearings are characterized by a lower noise level and produce practically no heat during operation. The number of revolutions per minute in modern IDE drives is equal to 5400 RPM or 7200 RPM; for modern SCSI drives it is 10000 RPM or 15000 RPM.

 A magnetic head is also a sophisticated construction composed of numerous details. Those details are so small that they are manufactured using photolithography method just like chips. Working surface of the head’s ceramic case is polished with the same precision as the disk itself. Heads’ actuator is a flat solenoid coil of copper wire suspended between the poles of a permanent magnet and fixed at one end of a lever rotating around a bearing. The other end of the lever is connected to a bracket carrying magnetic heads. The bracket is spring-loaded with a certain effort which allows the heads to “fly” at a definite height above the disk surface; the said height is usually equal to tenths of micron.

 The whole transport system moving the heads’ pack has been called Voice Coil by analogy with a loud-speaker cone. Its functional principle is similar to that of a common dynamic loud-speaker (i.e. copper coil in static magnetic field). Positioner’s coil is surrounded by a stator acting as a permanent magnet. When electric current of certain voltage and polarity appears in the coil the positioner starts turning to the corresponding side with respective acceleration; thus dynamic modification of current properties in coil allows positioning of magnetic heads to any location above disk surface.

 Drive heads are fixed when a drive is powered-off (in the parking zone) with special latches. Magnetic and pneumatic latches are two most widely used types. A magnetic latch is a small permanent magnet fixed within drive case and attracting ferrous lug on the voice coil in the heads’ parking position. Pneumatic latch (or air lock) also fixes a positioner in the parking zone preventing its further movement. When the magnetic disks begin rotation the air flow thus generated deflects the “sail” of an air latch and unblocks the positioning system.

 The electronic components inside HDA are limited to the preamplifier/commutator for the signal received from drive heads. It is located closer to the heads to minimize interference of external noise, right over the flexible cable from the heads to drive’s electronics. The same cable is connected to the voice coil and, sometimes, to the spindle motor; however, in most cases power supply of the spindle motor is implemented via a separate cable.

 A HDA is usually linked to the PCB with two connectors. One of them is a three-phase center-tapped connector for the spindle motor while the other delivers signals from the preamplifier/commutator and voice coil.

Printed circuit board
 The circuit design of modern drives is characterized by the use of a few highly-integrated chips; their block diagram is represented in figure 1.
 
 Figure1. Circuit design of modern drives

 As one can see in the picture, the whole layout is based upon four chips:
system controller chip including the read/write channel, disk controller and RISC control processor (microcontroller);
Flash ROM chip containing drive firmware;
chip controlling the spindle motor and voice coil;
ROM chip used as a cache buffer.

 Further increase of integration is impossible due to some basic differences in the operational modes of the above functional parts.

 The first system controller used in hard drives was a chip manufactured by Cirrus Logic. Its obvious breakthrough was manifested in the read/write channel, processor and disk controller integrated within one chip; however insufficiently developed methods of using such a microcircuit caused frequent malfunctions of  Fujitsu drives belonging to series  MPF3xxxAT and MPG.

 A microcontroller has RISC architecture. As soon as power supply is switched on after the /RESET interface signal the drive reset circuit sends a RESET signal to microcontroller which executes its program from ROM running self-diagnostics, cleaning the working data area in memory and programming disk controller and all programmable chips connected to the internal data bus of an HDD. Then microcontroller polls internal signals used during drive operation and if it detects no emergency alerts, it starts the spindle motor. The next stage of firmware operation is internal testing of an HDD checking data buffer RAM, disk microcontroller and the status of microcontroller signals input from its port. Then the microcontroller begins analyzing the frequency of pulses waiting until the spindle motor reaches defined rotational speed. As soon as the necessary speed is reached, the controller begins to manipulate the positioning circuit and disk controller moving the magnetic heads to the area containing recorded firmware data and transfers it to buffer RAM for further operation. Then the microcontroller switches to readiness and awaits commands from HOST. In that mode a command received from the central processor initiates a whole chain of actions performed by all the electronic components in a HDD.

 HDD read/write channel consists of a preamplifier/commutator (located inside HDA), read circuit, write circuit and a synchronizing clock.

 Drive preamplifier has several channels, each being connected to its respective head. The channels are switched by signals from the drive’s microprocessor. Preamplifier also contains a recording current switch and recording error sensor, which emits an error signal if a short circuit or break occurs in a magnetic head.

 Integrated reading/writing channel operating in the recording mode receives data from disk controller simultaneously with the recording clock frequency, performs data encoding, precompensation and transfers the data to preamplifier for writing to a disk. In the reading mode signal from preamplifier/commutator is transmitted to the automatic control circuit and then passes a programmable filter, adaptive compensatory circuit and pulse detector while being converted into data pulses sent to the disk controller for decoding and transfer through an external interface.
 Disk controller is the most complicated drive component which determines the speed of data exchange between a HDD and HOST.

 Disk controller has four ports used for connection to a HOST, microcontroller, buffer RAM and data exchange channel between it and HDD. Disk controller is an automatic device driven by microcontroller; from HOST side only standard registers of task file are accessible. Disk controller is programmed at the initialization stage by microcontroller, during the procedure it sets up the data encoding methods, selects the polynomial method of error correction, defines flexible or hard partitioning into sectors, etc.

 Buffer manager is a functional part of disk controller governing the operations of buffer RAM. The capacity of the latter ranges in modern HDDs from 512 Kb to 8 Mb. Buffer manager splits the whole buffer RAM into separate sectioned buffers. Special registers accessible from microcontroller contain the initial addresses of those sectioned buffers. When HOST exchanges data with one of the buffers the read/write channel can exchange data with another buffer sector. Thus the system achieves multisequencing for the processes of data reading/writing from/to disk and data exchange with HOST.

 Spindle motor controller regulates the motion of a 3-phase motor. It is programmed by the drive microcontroller. There are three control modes of spindle motor operation: the start mode, acceleration mode and stable rotation mode. Let us review the start mode. At power-up a reset signal is sent to the control microprocessor which performs initialization programming internal registers of spindle motor controller for a start. Drive controller generates phase switching signals; the spindle motor at that rotates at low speed generating self-induced electromotive force. Drive controller detects EMF and notifies the microprocessor which uses that signal for rotation control. In the acceleration mode microprocessor speeds-up phase switching and measures the rotational speed of the spindle motor until the speed reaches its rated value. As soon as the rated rotational speed is reached the controller introduces stable rotational mode. In that mode microprocessor calculates the time required for one revolution of the spindle motor based on the phase signal and adjusts the rotational speed accordingly. After relocation of magnetic heads from the parking zone the drive electronics begins tracking the stability of rotation using servo marks.

 Voice coil controller generates the control current moving drive positioner and stabilizing it over a defined track. Current value is calculated by microcontroller on the basis of digital error signal for head position relatively to a track (Position Error Signal or PES).  Current value in digital form is transmitted to CPU, the analogous signal thus received is enhanced and supplied to the voice coil.

Read More