If you’ve ever lost personal information due to a hard-disk drive failure? Either way, I advice you to get SMART.
Self-Monitoring, Analysis, and Reporting Technology – Part 2
In part one of this series one introduced the SMART monitoring and reporting system. I outlined the goals of SMART and explained how to determine whether your hard-disk supports this feature. In this part, one intends showing the basics of using the SMARTsystem to scan your hard-drive for errors and how to interpret the disk’s internal SMART thresholds.
The self test options!
SMART offers different levels of self-tests; these are, short, long, conveyance and selective. The most commonly of these four are the short and long tests; but, for completeness let’s look at the official definition of each as taken from the Linux man page for smartctl:
Short -runs SMART short self-test (usually under ten minutes). [Note: in the case of SCSI devices, this command option runs the “Background short” self-test.] This command can be given during normal system operation. This is a test in a different category than the immediate or automatic offline tests. The “Self” tests check the electrical and mechanical performance as well as the read performance of the disk. Their results are reported in the Self Test Error Log, readable with the ‘-l selftest’ option. Note that on some disks the progress of the self-test can be monitored by watching this log during the self-test.
Long / Extended – SMART Extended self-test (tens of minutes). [Note: in the case of SCSI devices, this command option runs the “Background long” self-test.] This is a longer and more thorough version of the short self-test described above. Note that this command can be given during normal system operation.
Conveyance – [ATA only] runs a SMART Conveyance self-test (minutes). This self-test routine is intended to identify damage incurred during transporting of the device. This self-test routine should take on the order of minutes to complete. Note that this test can be run during normal system operation.
Selective – [ATA only] runs a SMART Selective self-test, to test a range of disk Logical Block Addresses (LBAs), rather than the entire disk. Each range of LBAs that is checked is called a “span” and is specified by a starting LBA (N) and an ending LBA (M) with N less than or equal to M. The range can also be specified as N+SIZE. A span at the end of a disk can be specified by N-max.
Running a scan SMART?
Using smartctl you can ask the target disk to perform a self-test e.g.
smartctl -t short /dev/sda
As you can see from the output the command simply passes the request for the self-test to the requested disk and prints an estimate of the execution time, followed by the estimated completion date and time-stamp. The duration may differ for each disk and does have a correlation to disk size and speed.
Checking the self-test scan results?
After waiting the advised time, or, if you are impatient a few seconds before, one can use the SMART utility to query the results of the test e.g.
smartctl –log=selftest /dev/sda
As you can see, the results are quite simple to interpret. In this case the results show the disk has completed two short self-tests. One in response to my request (detailed above) and the other executed sometime prior to my request. At this point, SMART shows the disk has passed “Completed without errors”; however, to be sure, one would advise following any short self-test with a long, more intense, self-test.
The command format of the long self-test follows the same format as the short self-test – with the obvious substitution e.g.
smartctl -t long /dev/sda
Again, the output mirrors the short self-test with the obvious difference being the lengthier duration. To reiterate the results of a SMART scan can be obtained as follows:
smartctl –log=selftest /dev/sda
A useful switch you may wish to add to the command above is the “–attributes” switch e.g.
The “–attributes” switch instructs smartctl to print vendor specific SMART attributes. The attributes are numbered and have specific names. Each Attribute has a “Raw” value, printed under the heading “RAW_VALUE”, and a “Normalized” value printed under the heading “VALUE”. Each vendor uses their own algorithm to convert this “Raw” value to a “Normalized” value. Please keep in mind that smartctl only reports the different attribute types, values, and thresholds as read from the device. It does not carry out the conversion between “Raw” and “Normalized” values: this is done by the disk’s firmware.
The “WHEN_FAILED” column will display “FAILING_NOW” when the disk’s firmware detects that a threshold has been breached! The trigger point for a threshold breach is defined as the point at when the current normalized value is less than or equal to the threshold value! If this occurs and you value your data back it up and think about replacing the disk.
In part 3 we will go over how to automate a SMART scan.