If you’ve ever lost personal information due to a hard-disk drive failure? Either way, I advice you to get SMART.
Self-Monitoring, Analysis, and Reporting Technology – Part 3
This is part three of my series about SMART . In part one of this series I introduced the SMART monitoring and reporting system. I outlined the goals of SMART and explained how to determine whether your hard-disk supports this feature. Part two covered the basics of using the SMART system, I explained how to scan your hard-drive for errors and how to interpret the disk’s internal thresholds. In this post I intend on explaining how to automate the SMARTmonitoring and scanning options.
The default SMART set-up
Let’s refresh our minds about what files are included when we installed SMART in step one, this will help us understand the default behaviour of SMART. Using the RPM tools we can query the RPM database to list the files which make up the SMART package (note this post is CentOS based – other distributions may not provided the RPM tools and you may have to substitute this command for one that suites your distribution) e.g.
rpm -ql smartmontools
The majority of the files installed are documentation for the manual pages and a few example scripts. The rest of the files are the compiled binaries, init scripts, or text files for SMART. The ones of interest that we’ll be using are:
- /usr/sbin/smartd (binary)
- /usr/sbin/smartctl (binary)
- /etc/rc.d/init.d/smartd (shell script)
- /usr/sbin/smartd.conf (text file)
In order to understand the default behaviour we first need to understand what each of the above does.
The smartd binary is a daemon as indicated by the last letter (d). In Unix and Linux, a daemon is a program that runs as a background process, rather than being under the direct control of an interactive user.
smartd shell script
The smartd shell script complements the daemon process and can be used to start and stop the smartd daemon. The init script is located in /etc/rc.d/init.d/smartd e.g.
Command to view the above: vi /etc/init.d/smartd
smartd.conf text file
The smartd.conf text file is used by the smartd binary. The content of the file is read by smartd when it starts-up and dictates what actions the daemon should take. In other words the default behaviour of smartd is controlled by the setting within its default file. Let’s have a look inside!
A quick scan of the content shows that 99% of the file is just comments (lines starting with #). Only one line is uncommented by default: This line “DEVICESCAN -H -m root” can be split into three sections. The definition of each reveals the default behaviour of smart.
- The DEVICESCAN directive tells the smartd deamon to scan for all device,
- the -H switch tells smartd to monitor the overall health attribute of devices it found during the scan,
- and, finally the “-m root” parameter tells smartd to message the root user with scan reports.
Automating SMART scans
Now we understand the default behaviour, we can reflect on what this means! In my opinion this behaviour is passive. We really want the monitoring process to be active – meaning we want to make sure that each sector of the disk is accessed on a regular basis. Consider the fact that, there is no point monitoring the overall health attribute if a bad sector on the disk is never accessed. It’s too late if your mission critical server attempts to access the bad part of the disk – I’d prefer SMART to spot it first! In order to benefit fully from smartd’s monitoring capabilities it would be great if we could ask smartd to run a regular short and long test. Well you can!
The more observant may have noticed the advice within the comments in screen-shot above: the DEVICESCAN line:
“Most users should comment out DEVICESCAN and explicitly list the devices that they wish to monitor.”
Therefore, the first step towards a more active scan is to comment-out, or, remove this line. You should then add one line per hard-disk e.g.
- “/dev/sda” section tells smartd which drive to scan,
- “-a” tells smartd to monitor several status; including: health, failures, changes to attributes and increases in error counts such as offline pending sector count,
- “-o on” section tells smartd to enable SMART automatic offline test, which scans the drive every four hours for disk defects,
- “-S on” section tells smartd to enable the autosave of device vendor-specific attributes.
- “-s (S/../.././05|L/../../4/06)” section instruct smartd to run self-tests at the scheduled time. In this sample a short-test is executed between 05am and 06am everyday and a long test is performed at 06am on the forth day (Thursday),
- “-m root” this tells smartd to message the root user with reports of any failures,
- “-M test” modifies the behaviour of the smartd email warnings to send a single test email immediately upon smartd startup,
- “-M daily” modifies the behaviour of the smartd email to send an additional warning reminder once per day, for each type of disk problem detected.
One you’ve finished modifying the smartd.conf file you will need to restart smartd daemon so that the new configuration is enabled. This can be done using the init scritp e.g.
One final check is to ensure the SMART daemon is started at boot. First run the following command:
On the system above the chkconfig command shows smartd will not be started at start-up. To correct this you can run the first command in the screen-shot below:
Job done! SMART is now enabled on your system.