Some typical malfunctions of hard drives and methods of their repair

Always to make repairing hard drive it is necessary to use special complicated eqipments, but sometimes you need desoldering station and programmator only. In the last part of our descriptive survey we would like to address some typical malfunctions of hard drives and methods of their repair.

 As we have mentioned in our previous articles devoted to problems with hard disk drives, a drive consists of 2 main parts: a mechanical part (heads-and-disk assembly) and electronics (control printed circuit board). Those two components are supplemented with internal firmware, which is partially stored in ROM on PCB and partially resides within firmware zone of a drive (that latter portion is loaded to RAM of HDD microcontroller during its initialization). Those three components interact very closely and normal HDD operation is possible only when all of them function properly. Consequently a drive malfunction may result with equal probability from failure of any of the mentioned components, and that can be observed in real life. Moreover, in various HDD models from different manufacturers the frequency and degree of damage to different components is not the same. When a HDD has to be repaired in conditions offered by a regular (not specialized) laboratory we have to decline some repair orders. In the first place, it pertains to the repair of HDD mechanics – HDA, secondly – to the service data in the firmware area of a drive.

 The difficulty of HDA repair is connected, first of all, with exceptional purity of air contained under normal pressure inside the case (no more than 100 dust particles per 1 cubic meter of air). Opening a case in usual premises or in common laboratory conditions will inevitably lead to dust penetration inside (in usual rooms 1 cubic meter of air contains approximately 600 dust particles) and that is sure to cause damage to precise mechanics. Few companies, which perform repair of drive mechanics use in their work special clean rooms or clean worktables (tables equipped with a special “aquarium” with sleeves inside for performance of necessary work). Besides, a whole set of specialized tools is required including T type screwdrivers (from T9 to T3), hex screwdrivers, mounting supports that allow hard fixing of a HDA for work on it as well as various lifters for heads’ blocks in HDDs of different types. We should add to the above list requirements to engineering personnel who have to perform such jobs. The people should be accurate, move precisely and certainly they should have experience. One incorrect motion with a tool or a finger touch to magnetic disks will render drive repair impossible at once or will make it more complicated at least by order of magnitude. It is because of those pitfalls that most companies possessing specialized equipment for HDD repair do not undertake to perform works related to their mechanical parts.

 The simplest drive repair consists in restoration of software modules in its firmware zone. Corruption of modules is one of three possible HDD malfunctions rendering a drive inoperable although all mechanical and electronic parts remain completely intact. As a rule, a drive with such defect is not visible in computer BIOS and any attempt to access it ends with an ABRT error (the command cannot be executed). Repair of such malfunctions requires just overwriting of the corrupted module; the drive will become operational again. The procedure takes 5-10 minutes on the average. However, that seeming simplicity of the solution hides its complicated implementation. As a matter of fact, module recording is possible only in a special factory mode of drive operation. A drive is switched into that mode by special commands (the so-called key) which differ not only with various manufacturers, but also for different drive families of one manufacturer and those commands are kept secret. Firmware structure may also be very different. Modules can be overwritten with copies obtained from identical models and taking into account firmware version and module type. We should also mention that incorrect module overwriting or recording of an incompatible module version may damage a drive once and for all. Thus, for example, erroneous recording of a configuration module with information about the number of magnetic heads may result in firmware attempt to address a non-existent head during initialization at drive power-up. The drive at that will begin to knock endlessly hitting its positioner against the limiting stop and at last it will damage its magnetic surfaces if it is not switched off in time. But after the next power-up the problem will recur. Therefore operations over firmware zone should be as careful and accurate as actions over drive mechanics, i.e. HDA. That is why drive manufacturers password-protect and keep secret access to it. Thus, with all the simplicity of repair for drives with damaged firmware data, such procedures are not possible without special software and frequently even without a whole hardware and software complex. In addition to the actual technological utilities a host of which may be included into such complex (an individual utility exists for each drive family) users need documentation – clear methodology of testing and restoration for failing firmware zone, which is also individual for each drive. High cost of such equipment does not allow everyone to purchase it, so we shall describe the methods of HDD repair, which do not require specialized tools, devices and software.

 One of the basic principles for any repair reads “do not make it any worse”, that is why it is important to perform accurate diagnostics of malfunction and, probably, refuse to repair that drive and send the customer to a specialized service centre, if the malfunction is caused by the HDD mechanics or corrupted firmware data. As an example we shall discuss the analysis of a very widely spread malfunction – “HDD knocking”.

 If at power-up a drive produces periodic knocking sounds (hitting its positioner against the limiting stop), it means that the drive is unable to read servo information from disks’ surfaces. There may be a lot of reasons for that:

  • malfunctioning magnetic heads;
  • malfunctioning preamplifier/commutator located inside HDA in the immediate vicinity of the heads;
  • malfunctioning PCB, namely:
  •  – reading/data conversion channel;
     – positioner controller microchip;
     – supply circuits (stabilizers, filters, generators of negative voltages).

     In addition to the above list, such malfunction may be caused by incorrect recording of firmware modules, when a non-existent head is selected and, as a result, the stream of servo data is missing. Precise diagnostics of that malfunction is complicated and difficult even for an experienced specialist in HDD repair, but still there are a few tricks that can simplify the task a little. First of all, you will need to identify where the cause of malfunction is located – is it in HDA or control board. To do so, remove the drive’s PCB and replace it with a known good board from the same model with an identical firmware version. We should note that it is not possible for all models, recent Seagate models and Fujitsu MPG3xxxAT drives keep in ROM unique adaptive parameters and during PCB swap the original ROM should also be swapped. If knocking stops and the drive reports on readiness, then you should check the board for the cause of malfunctions. If the drive keeps knocking with a known good board, the cause of malfunction is inside HDA and in that case it is time to give up repair. Under no circumstances should you open the HDA just to see what has happened inside. Most likely you will not see any visible faults but the damage from opening will be considerable. Thus, of all the possible types of HDD malfunctions only repair of electronics board can be recommended for a regular laboratory without special equipment.

    Read More

    Acquiring Electronic Evidence from Hard Drive

    Forensic Image of the hard drive means to take an exact copy of a hard drive including deleted files and areas of the hard drive that a normal backup would not copy;
    Never boot off of the hard drive;
    Use write protection software to protect the original evidence;
    Make a copy of the original evidence and do all work off of the copy;
    Document all aspects of the hard drive;
    Tag and store original evidence;
    Best evidence is original evidence;

    Read More

    How to Secure the Computer as Evidence?

    Photograph and log room, position of computer and status of computer;
    If the computer is “OFF,” Do Not Turn “ON”;
    If the computer is “ON,” Do Not Turn “OFF”;
    Place Evidence tape over each drive slot;
    Photograph and label back of computer components while they are plugged in;
    Label all connection ends to allow reassembly if needed;
    If transporting, treat all components as fragile;
    Collect all devices such as cables, keyboards and monitors;
    Collect instruction manuals, documentation, and notes;
    User notes may contain passwords;

    Read More

    Computer Forensics Defined

    “Computer Forensics deals with the preservation, identification, extraction and documentation of computer evidence.”*

    “Computer forensics has also been described as the autopsy of a computer hard disk drive because specialized software tools and techniques are required to analyze the various levels at which computer data is stored after the fact.”*

    Recovering Information the naked eye can no longer see.

    Read More

    Fundamentals of searching for malfunctions

    The description above should demonstrate that a HDD is a sophisticated software and hardware device combining electronic and mechanical parts and utilizing the most recent achievements of microelectronics, micromechanics, automatic control theory, magnetic recording theory, and coding theory. HDD repair is impossible without specialized knowledge, special equipment, instruments and tools, and without a specifically equipped location (clean room). However, an expert in computer hardware can perform primary diagnostics of HDD and repair simple failures, perform operations over BAD sectors using software offered by HDD manufacturers.

    In the absence of special diagnostic equipment and software HDD diagnostics should begin with connection to an individual PC power supply unit. Operator’s hearing is the diagnostic tool in that case. At power-up a HDD spins up the spindle motor, sound level increases for 4 – 7 sec., then a click follows (heads are moved from the parking zone) and very specific recalibration crackling noise that lasts 1-2 sec. It is easy to get used to such drive behaviour by connecting a known good HDD to a power supply unit.

    Recalibration procedure performed by a drive demonstrates at least operability of the reset circuit, its clock, microcontroller, spindle motor control circuit and positioning system, data conversion channel, normal status of magnetic heads (at least one of them, the one used for the initialization process) and drive firmware data.

    For further diagnostics a HDD has to be connected to the Secondary IDE port and automatically detected in BIOS through the SetUp procedure. If the model of the HDD being checked is recognized, the operating system loads and computer starts diagnostic software. OS can be started from a working HDD connected to Primary IDE port or from a floppy disk. The easiest diagnostics would be an attempt to create a partition on the drive being checked using FDISK procedure and subsequent formatting procedure with Format d:/u command. Formatting in DOS or Windows OS does not accomplish the actual “formatting”, instead the OS performs surface verification, creating in the end a file system structure selected for the partition. If formatting (verification) reveals any defects, they will be displayed on-screen as BAD sectors. Of course, such diagnostics is primitive and aimed rather towards checking HDD operability than discovery of malfunction causes or, moreover, their elimination. More detailed diagnostics can be performed using utilities recommended by manufacturers and available from their web pages.

    Thus, for Fujitsu drives we can recommend a whole section devoted to diagnostic software:

    http://www.fel.fujitsu.com/home/drivers.asp?L=en&CID=1

    For Western Digital drives:

    http://support.wdc.com/ru/download/

    For Samsung drives:

    http://www.samsung.com/Products/HardDiskDrive/utilities/index.htm

    For Seagate drives:

    http://www.seagate.com/support/software/

    For Maxtor drives:

    http://www.maxtor.com/en/support/downloads/powermax.htm

    For IBM drives offered under a new HGST brand:

    http://www.hgst.com/hdd/support/download.htm

    All the above utilities perform testing in regular user mode and do not switch drives to factory mode; therefore their features are rather limited. Specialized diagnostic utilities are not offered for free; instead they are distributed to special service centers and dealers of drive manufacturers.

    Let us show an example of searching for malfunction in the spindle motor control circuit of a Caviar HDD manufactured by Western Digital.

    The layout scheme below is used in WDAC32500 and WDAC33100 drive families and takes into account all ratings and serial numbers of components, but it is also applicable for repair of WDAC2340, WDAC2420, WDAC2540, WDAC2700, WDAC2850, WDAC33100, WDAC31200, WDAC21200, and WDAC31600 drive families if you ignore serial numbers of components and assume that some ratings differ from the values shown in the layout scheme (Figure 5).

    If at HDD power-up its spindle motor does not start you should first make sure that the HDA is operational by connecting it to a known good PCB. If there is no such opportunity you should check the resistance of coils (phases) of the spindle motor, it should correspond to ~ 2 Ohm relatively to middle output; then continue to look for the malfunction on the PCB. (Inability to start a spindle motor frequently results from sticking of magnetic heads to disks).

    In order to check a PCB for failed components, you should remove it from the HDA, connect to an external power supply and position it on the worktable with electronic components facing up. Further operations will require an oscilloscope with sweep frequency up to 50 MHz.

    First of all, you should switch on power and check the feed +5 V and +12V voltages at outputs from the U3 and U6 chips (see layout scheme), check excitation of quartz resonator at outputs 24 and 33 from U6 chip. Then check for presence of clock pulses supplied to the U9 control microprocessor and U11 reading channel to 57 and 13 outputs respectively. After that make sure that there is no RESET signal (active level О). If all the requirements are met then the control microprocessor will start and perform the initialization procedure programming all chips connected to the internal data bus. You can check microprocessor operability indirectly judging by the presence of control pulses: ALE, RD#, WR#, data bus pulses, etc.

    To check the spindle motor control circuit you should trigger 10 ms/div oscilloscope sweep with 2V/div amplification (it is advisable to use 1:10 multiplier). After power-up check for presence of motor start pulses with 11 – 12 V amplitude for three phases (connections J14, J13, J12). The control circuit will try to start the motor for 1 – 2 min., then it will discontinue the attempts. After that you should switch power off/on or send a RESET command by short-circuit of lines 1 and 2 in IDE interface connector using tweezers. If voltage is lower than 10 V for any phase, then U3 chip is malfunctioning. As a result of such failure the spindle motor most likely spins up but remains unable to gain rated rotational speed and, consequently, magnetic heads cannot be shifted from the parking zone. Rotational speed of spindle motor can be controlled using the INDEX pulses at the Е35 control point (if a PCB is connected to the HDA). The frequency of INDEX pulses is ~12 ms, width of INDEX pulses is – 140 nanoseconds. U3 chip is controlled by the U6 synchronization controller chip and the SPINDLE START signal of the spindle motor. For motor start SPINDLE START = 1, for motor stop it is = 0.

    Phase distribution is controlled by the U6 chip through its Fc1 – Fc6 outputs; it uses TTL range of control signals. Feedback of rotational speed is accomplished through the 32Р4910А U11 reading channel chip using the SERVO READ DATA line. In its turn, the U6 synchronization controller chip generates the signal for servo field search (SERVO GATE) for U11 chip.

    Servo signals and numbers of control points are indicated in the figure 6 and 7. The signals can be viewed more conveniently using oscilloscope with 100 MHz or greater sweep range since INDEX pulses and servo marker last for about ~140 nanoseconds (it is also advisable to use 1:10 multiplier). Monitoring should be performed using two sources, synchronizing the oscilloscope by INDEX or by servo marker. It may be interesting to watch not only servo signals at the Е37 control point but also data reading signals in general at the Е13 and Е7 control points, where one can see all synchronization fields, sectors, etc. (See figure 8).

     

    Details on functioning of control microprocessor, data reading channel and spindle motor control chip are available at web sites of Intel, Silicon Systems Incorporation and SGS-Thomson respectively: www.intel.com and www.st.com.

    Read More

    Technologies used for maintaining HDD reliability

    With all the complications HDD manufacturers are constantly trying to make user data storage more reliable. To accomplish that they use various methods and technologies in their drives.

    Figure 5. Control circuit of spindel of HDD (family WDAC 32500 and WDAC 33100)
    S.M.A.R.T. (abbreviated Self-Monitoring, Analysis, and Reporting Technology) is intended to inform hard drive users about the status of its main parameters. Many motherboard BIOSes support analysis of those parameters at computer power-up and if some critical parameter exceeds its emergency limit an informational message is displayed during computer start-up. Of course, it does not mean that the drive will stop functioning, but the user should take some steps in that situation, for example, prepare a backup copy of valuable data. If computer BIOS does not contain an analyzer of S.M.A.R.T. attributes you can use an external diagnostic utility launched from within the operating system. The list of such utilities includes, for instance, SMART Vision available from http://www.acelab.ru/products/pc/traning.html.

    For greater reliability practically all drives use a technology, which allows hiding and relocation of occurring defects immediately during operation. Some peculiarities of its implementation may vary with different drive models; however, they are all based upon the same principle. If the operating system attempts to access a sector, which cannot be read or written to, then the drive will replace it if possible (if there is sufficient reserved space) with a sector from the reserved zone (assign). The table of thus substituted sectors is stored in drive firmware zone and the drive loads it to controller ROM at power-up.

    Impact sensors found in all drives also belong to technologies used for protection against malfunctions. It is a piezoelectric sensor producing an electric pulse at mechanical shock. Filtering of sensor pulses allows identification of obvious impacts. When a drive detects shock action, it parks magnetic heads. One peculiarity of impact sensor installation is the angle of its mounting relative to front case line. It is equal to 45O.

    In recent models manufacturers have began to use widely temperature sensors in PCB and heads’ block. Temperature information is monitored by drive processor and the drive stops operation if the allowed value is exceeded. In some drive models temperature is output as S.M.A.R.T. attribute value and there are programs (usually available from the web pages of HDD manufacturers) which allow viewing it.

    Read More

    Computer Forensic Tool: Encase Forensic

    encase-forensicEnCase Forensic is the industry standard in computer forensic investigation technology. With an intuitive GUI, superior analytics, enhanced email/Internet support and a powerful scripting engine, EnCase provides investigators with a single tool, capable of conducting large-scale and complex investigations from beginning to end. Law enforcement officers, government/corporate investigators and consultants around the world benefit from the power of EnCase Forensic in a way that far exceeds any other forensic solution.

    -Acquire data in a forensically sound manner using software with an unparalleled record in courts worldwide.

    -Investigate and analyze multiple platforms — Windows, Linux, AIX, OS X, Solaris and more — using a single tool.

    -Save days, if not weeks, of analysis time by automating complex and routine tasks with prebuilt EnScript® modules, such as Initialized Case and Event Log analysis.

    -Find information despite efforts to hide, cloak or delete.

    -Easily manage large volumes of computer evidence, viewing all relevant files, including “deleted” files, file slack and unallocated space.

    -Transfer evidence files directly to law enforcement or legal representatives as necessary.

    -Review options allow non-investigators, such as attorneys, to review evidence with ease.

    -Reporting options enable quick report preparation.

    Read More

    HDD malfunctions

     “Nothing is eternal” – that expression applies also to hard disk drives. No matter how reliable a HDD is still it is degraded with time by destructive processes.

     First, a drive is a mechanical and electronic device but all mechanical parts gradually wear out. With time connections between mechanical parts become slack. Numerous ascensions and descents of magnetic heads which occur during each start and stop of magnetic disk rotation destroy the protective layer coating the heads. However, modern manufacturing technology guarantees rather long life for hard drives. Thus, according to the information from the technical manual for operation of Western Digital drives (Caviar BB/JB family) the minimum number of contacts between magnetic heads and disk surface during start/stop (Contact Start/Stop Cycles – CSS) is at least 50000 cycles, while unrecoverable reading errors (Error Rate – Unrecoverable) appear less frequently than once per 10 bytes raised to the 14th power. If we translate those figures into generally understandable terms we receive the following: minimum time before any deterioration in the quality of heads or surfaces because of their contacts provided that the drive is switched on and off ten times daily will be 14 years; and one error will occur during reading of more than 32 TB of data (that approximately corresponds to viewing movies in MP4 format non-stop for 7 – 10 years).

    Still, in real life we frequently face a totally different situation when a brand new drive purchased recently goes out of order after a few months of operation. Numerous drives even do not endure the warranty period defined by their manufacturing factory. We have to note that all manufacturers except for Samsung have decreased that period from 3 years to one. What are the reasons of such situation?

    Normal HDD ageing malfunctions
     During correct operation of a properly assembled drive performed in conformity to all requirements of its Technical Reference Manual with time you can observe normal ageing process. It tells most badly on magnetic disks. First, with time the magnetization of minimum magnetic “prints” – dibits – decreases and a drive has to re-read some portions of disks, which used to read flawlessly, or they even begin to produce reading errors. In the second place, the magnetic layer on disks also deteriorates gathering scratches, chippings, cracks, etc. All of the above cause appearance of BAD sectors.

    The process of normal drive ageing is quite long and usually it takes 3-5 years. We have to note that for a HDD non-stop mode of operation is even more favourable than a mode, when a drive starts and stops frequently. Thus drives function quite long in dedicated servers operating round-the-clock and located in a separate premise or a box with obligatory normal climate control.

    Malfunctions resulting from incorrect mode of operation
     The most frequent cause of HDD malfunctions has to deal exactly with incorrect manner of their operation, its main destructive factors include: overheating, mechanical impacts and voltage jumps of HDD power supply.

    Overheating is caused by insufficient cooling of drive case and PCB. According to the technical reference manual for Western Digital drives (Caviar BB/JB family) the allowed operational drive temperature ranges from 5 С to 550 С provided that air circulates around all the time. The latter condition is determined by the fact that some chips on the control board become much warmer than the above temperature (motor controllers, etc.) and heat dissipation must be arranged for them. Now let us imagine that it is summer time, temperature inside may reach 30 С, within computer case it will grow to the extreme values – by another 20 – 250 С – while there is no normal air circulation because there is only one blow-out fan in the power supply clogged with dust, flat cables inside form a tight knot and the drive is blocked from both sides between a CD drive and FDD. An open computer case at that does not remedy the situation because it does not facilitate air flow around HDD.

    Another important temperature value is its gradient, which should not exceed 200 С per hour during operation and 300 С during downtime. When the latter is exceeded, it is very dangerous for drive mechanics; that phenomenon is called thermal shock. Thus if you bring a HDD during winter time from a store or from a friend (where you had to read some necessary data) and it is frosty outside and 200 С inside, then if you power-up the drive immediately it causes sudden local heating of separate mechanical HDA parts, which may cause micro deformations of precise drive mechanics. Such a drastic temperature drop is very harmful for electronic components, too.

    The same holds true regarding mechanical influence over HDA, i.e. impacts which are also very dangerous for precise mechanical parts of a drive. During operation as described in the previous article, spring-loaded magnetic heads fly at a low height above disks rotating at a rather high speed. An impact against HDA in that situation will cause inevitable vibration of heads which will produce a series of hits against disks, which in turn are sure to cause chipping both on disk surface and on the surface of magnetic heads.

    Very serious danger for HDD electronics is manifested by power supply units powering the whole PC and the drive respectively. In order to make their price lower manufacturers frequently do not install filtering circuitry both in the primary 220 V chain and in secondary circuit. Very frequently rated power does not correspond to the actual values and stabilized voltage turns out to be not so stable although those parameters are strictly regulated for disk drives. Thus, according to the technical reference manual for Western Digital drives (Caviar BB/JB family) allowed power supply voltage is +5 V +- 5% and +12 V +- 10%, allowed fluctuation is 100 mV in +5V circuits and 200 mV in 12 V circuits. Most specialists servicing computer equipment use only voltage meters while testing power supply units, but one should keep in mind that voltage fluctuations, which are an important parameter can be checked with an oscilloscope only.

    Construction-related malfunctions
     Quality of HDDs has decreased lately; that fact is confirmed by reduction of warranty period by many manufacturers. To some extent it is caused by stiff competition between them and the resulting race for production of cheap drives. It is also connected with growing technological standards, a sort of a race for density increase and achievement of higher capacity per disk. As a consequence vendors frequently use in their HDDs solutions, materials and technologies, which have not been thoroughly tested and verified; thus imperfect products appear in the market and then in possession of end users. After some time manufacturers analyze malfunctions of drives returned during their warranty period and attempt to eliminate drawbacks in their construction, but those attempts are not always successful.

    Theoretically such approach to drive design and production may cause problems with any drive part. We can single out the most frequent troubles:

    Bad contact in pin connector between PCB and preamplifier chip connected to magnetic heads’ assembly. The consequences of a poor contact may be quite numerous. First of all, it causes appearance of bad sectors. But those sectors differ from common defects caused by poor surface quality. The difference manifests itself in the fact that the surface remains intact but bad contact causes recording of invalid data to service bytes of some sectors, e.g. to the field containing CRC code of the sector. The problem may also lead to corruption of firmware data, which cannot be restored by the drive itself during the next power-up; besides, there is no user mode for such restoration. Firmware data of a drive can be restored in the factory mode only.

    Poor quality of chips’ soldering at the factory. Such workmanship flaw becomes obvious as a rule approximately after a year of drive operation. It is usually manifested in lack of contact, i.e. after some period of normal operation a drive either switches off and does not start again (“hangs”) or begins to produce knocking sounds with its heads; the latter situation may result in damage to its mechanical parts. Just like the previous flow it may also cause firmware corruption.

    Insufficient quality of chips becoming defective even at heating values, which do not exceed allowed limits. The fault can be repaired by replacing the defective chip with an identical operational one.
    Imperfect construction of fluid dynamic bearings, which causes accumulation of scrap particles in the grease resulting in spindle motor seizure.

    There are also cases when disks are not fixed on a spindle properly, as a result disk beating grows increasingly and causes bearing destruction in spindle motor. Considerable noise begins to accompany drive operation and after some time defective sectors appear because disk beating leads to incorrect reading of some tracks.

    Poor quality of Flash ROM chips, which may lose the firmware code stored therein because of charge leakage when heated. ROM can be overwritten either in a special ROM chip programmer or using the drive itself in the factory mode.

    Errors in drive firmware microcode. Manufacturers do not make public the information about the nature of such errors keeping it secret. However, firmware updates are issued quite regularly. It would be a mistake to believe that the errors do not influence drive’s operability in any way because in some cases they may result in damage to drive mechanics.

    Read More