Self-Monitoring, Analysis, and Reporting Technology

Self-Monitoring, Analysis, and Reporting Technology

Self-Monitoring, Analysis, and Reporting Technology, or S.M.A.R.T. (sometimes written SMART), is a monitoring system for computer hard disks to detect and report on various indicators of reliability, in the hope of anticipating failures.

Background

Fundamentally, hard-drive failures fall into one of two basic classes:
*Predictable ones, which occur gradually over time – e.g., mechanical wear and gradual degradation of storage surfaces. A monitoring device can detect these, much as a temperature dial on a vehicle's dashboard can warn a driver, before serious damage has occurred, that the engine has started to overheat.
*Unpredictable ones, which may occur suddenly and without warning, such as failure of an electronic component, or sudden mechanical failure (perhaps owing to mishandling).

Mechanical failures account for about 60 percent of all drive failures. [ [http://www.seagate.com/docs/pdf/whitepaper/enhanced_smart.pdf Seagate statement on enhanced smart attributes] ] Most mechanical failures result from gradual wear, although an eventual failure may be catastrophic. However, before complete failure occurs, there are usually certain indications that failure is imminent. These may include increased heat output, increased noise level, problems with reading and writing of data, a marked increase in the number of damaged disk sectors, and so on.

The purpose of S.M.A.R.T. is to warn a user or system administrator of impending drive failure while there is still time to take preventative action, such as copying the data to a replacement device. Approximately 30% of failures can be predicted by S.M.A.R.T. [ [http://smartlinux.sourceforge.net/smart/faq.php?#2 How does S.M.A.R.T. work?] ] Work at Google on over 100,000 drives has shown little overall predictive value of S.M.A.R.T. status as a whole, but suggests that certain sub-categories of information which some S.M.A.R.T. implementations track "do" correlate with actual failure rates – specifically, in the 60 days following the first scan error on a drive, the drive is, on average, 39 times more likely to fail than it would have been had no such error occurred. Also, first errors in reallocations, offline reallocations and probational counts are strongly correlated to higher probabilities of failure. [ [http://labs.google.com/papers/disk_failures.pdf Failure Trends in a Large Disk Drive Population] (Conclusion section), by Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz André Barroso, Google Inc. 1600 Amphitheatre Pkwy Mountain View, CA 94043]

PCTechGuide's page on S.M.A.R.T. (2003) [ [http://www.pctechguide.com/31HardDisk_SMART.htm PCTechGuide's page on S.M.A.R.T. (2003)] ] comments that the technology has gone through three phases: : "In its original incarnation SMART provided failure prediction by monitoring certain online hard drive activities. A subsequent version improved failure prediction by adding an automatic off-line read scan to monitor additional operations. The latest SMART technology not only monitors hard drive activities but adds failure prevention by attempting to detect and repair sector errors. Also, whilst earlier versions of the technology only monitored hard drive activity for data that was retrieved by the operating system, this latest SMART tests all data and all sectors of a drive by using "off-line data collection" to confirm the drive's health during periods of inactivity."

History and predecessors

The industry's first hard disk monitoring technology was introduced by IBM in 1992 in their IBM 9337 Disk Arrays for AS/400 servers using IBM 0662 SCSI-2 disk drives. [ [http://www-306.ibm.com/common/ssi/OIX.wss?DocURL=http://d03xhttpcl001g.boulder.ibm.com/common/ssi/rep_ca/9/877/ENUSZG92-0289/index.html&InfoType=AN IBM Announcement Letter No. ZG92-0289 dated September 1, 1992] ] Later it was named Predictive Failure Analysis (PFA) technology. It was measuring several key device health parameters and evaluating them within the drive firmware. Communications between the physical unit and the monitoring software were limited to a binary result – namely, either "device is OK" or "drive is likely to fail soon".

Later, another variant, which was named IntelliSafe, was created by computer manufacturer Compaq and disk drive manufacturers Seagate, Quantum, and Conner [Seagate – [http://www.seagate.com/support/kb/disc/smart.html The evolution of S.M.A.R.T.] ] . The disk drives would measure the disk’s "health parameters", and the values would be transferred to the operating system and user-space monitoring software. Each disk drive vendor was free to decide which parameters were to be included for monitoring, and what their thresholds should be. The unification was at the protocol level with the host.

Compaq submitted their implementation to Small Form Committee for standardization in early 1995. [Compaq. IntelliSafe. Technical Report SSF-8035, Small Form Committee, January 1995.] It was supported by IBM, by Compaq's development partners Seagate, Quantum, and Conner, and by Western Digital, who did not have a failure prediction system at the time. The Committee chose IntelliSafe's approach, as it provided more flexibility. The resulting jointly developed standard was named S.M.A.R.T.

MART Information

The technical documentation for SMART is in the AT Attachment (ATA) standard. [Citation
editor-last = Stephens
editor-first = Curtis E
title = Information technology – AT Attachment 8 – ATA/ATAPI Command Set (ATA8-ACS), working draft revision 3f
publisher = ANSI INCITS
date = December 11, 2006
url = http://www.t13.org/Documents/UploadedDocuments/docs2006/D1699r3f-ATA8-ACS.pdf
pages = 198–213, 327-344
]

The most basic information that SMART provides is the SMART status. It provides only two values: "threshold not exceeded" and "threshold exceeded". Often these are represented as "drive OK" or "drive fail" respectively. A "threshold exceeded" value is intended to indicate that there is a relatively high probability that the drive will not be able to honour its specification in the future – that is, the drive is "about to fail". The predicted failure may be catastrophic or may be something as subtle as the inability to write to certain sectors, or perhaps slower performance than the manufacturer's declared minimum.

The SMART status does not necessarily indicate the drive's past or present reliability. If a drive has already failed catastrophically, the SMART status may be inaccessible. Alternatively, if a drive has experienced problems in the past, but the sensors no longer detect such problems, the SMART status may, depending on the manufacturer's programming, suggest that the drive is now sound.

The inability to "read" some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. In order to prevent this problem, modern hard drives will always finish writing at least the current sector immediately after the power fails (typically using rotational energy from the disk). Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area, so that the sector can be overwritten. [Citation
last= Hitachi Global Storage Technologies
first=
author-link= Hitachi_Data_Systems
date=19 September 2003
contribution=
contribution-url=
title=Hard Disk Drive Specification: Hitachi Travelstar 80GN, revision 2.0
id=Hitachi Document Part Number S13K-1055-20
url=http://www.hitachigst.com/tech/techlib.nsf/techdocs/85CC1FF9F3F11FE187256C4F0052E6B6/$file/80GNSpec2.0.pdf
]

More detail on the health of the drive may be obtained by examining the SMART Attributes. SMART Attributes were included in some drafts of the ATA standard, but were removed before the standard became final. The meaning and interpretation of the attributes varies between manufacturers, and are sometimes considered a trade secret for one manufacturer or another. Attributes are further discussed below. [Citation
last = Hatfield
first = Jim
title = SMART Attribute Annex
date=September 30, 2005
url = http://www.t13.org/Documents/UploadedDocuments/docs2005/e05148r0-ACS-SMARTAttributesAnnex.pdf
id = e05148r0
]

Drives with SMART may optionally support a number of 'logs'. The "error log" records information about the most recent errors that the drive has reported back to the host computer. Examining this log may help one to determine whether computer problems are disk-related or caused by something else.

A drive supporting SMART may optionally support a number of self-test or maintenance routines, and the results of the tests are kept in the "self-test log". The self-test routines may be used to detect any unreadable sectors on the disk, so that they may be restored from back-up sources (for example, from other disks in a RAID). This helps to reduce the risk of incurring permanent loss of data.

tandards and implementation

Many motherboards will display a warning message when a disk drive is approaching failure. Although an industry standard among most major hard drive manufacturers, [pctechguide: "Industry acceptance of PFA technology eventually led to SMART (Self-Monitoring, Analysis and Reporting Technology) becoming the industry-standard reliability prediction indicator..." [http://www.pctechguide.com/31HardDisk_SMART.htm] ] there are some remaining issues and much proprietary "secret knowledge" held by individual manufacturers as to their specific approach. As a result, S.M.A.R.T. is not always implemented correctly on many computer platforms, due to the absence of industry-wide software & hardware standards for S.M.A.R.T. data interchange.Fact|date=January 2007

From a legal perspective, the term "S.M.A.R.T." refers only to a signalling method between internal disk drive electromechanical sensors and the host computer. Hence, a drive may be claimed by its manufacturers to include S.M.A.R.T. support even if it does not include, say, a temperature sensor, which the customer might reasonably expect to be present. Moreover, in the most extreme case, a disk manufacturer could, in theory, produce a drive which includes a sensor for just "one" physical attribute, and then legally advertise the product as "S.M.A.R.T. compatible".

Depending on the type of interface being used, some S.M.A.R.T.-enabled motherboards and related software may not communicate with certain S.M.A.R.T.-capable drives. For example, few external drives connected via USB and Firewire correctly send S.M.A.R.T. data over those interfaces. With so many ways to connect a hard drive (SCSI, Fibre Channel, ATA, SATA, SAS, SSA, and so on), it is difficult to predict whether S.M.A.R.T. reports will function correctly in a given system.

Even on hard drives and interfaces that support it, S.M.A.R.T. information may not be reported correctly to the computer's operating system. Some disk controllers can duplicate all write operations on a secondary "back-up" drive in real time. This feature is known as "RAID mirroring". However, many programs which are designed to analyze changes in drive behaviour and relay S.M.A.R.T. alerts to the operator do not function properly when a computer system is configured for RAID support. Generally this is because, under normal RAID operational conditions, the computer is not permitted by the RAID subsystem to 'see' (or directly access) individual physical drives, but may access only logical volumes instead.

On the Windows platform, many programs designed to monitor and report S.M.A.R.T. information will function only under an administrator account. At present, S.M.A.R.T. is implemented individually by manufacturers, and while some aspects are standardized for compatibility, others are not.

ATA S.M.A.R.T. Attributes

Each drive manufacturer defines a set of attributes, and sets threshold values beyond which attributes should not pass under normal operation. Each attribute has a "raw value", whose meaning is entirely up to the drive manufacturer (but often corresponds to counts or a physical unit, such degrees Celsius or seconds), and a normalized value, which ranges from 1 to 253 (with 1 representing the worst case and 253 representing the best). Depending on the manufacturer, a value of 100 or 200 will often be chosen as the "normal" value.

Manufacturers that have supported at least one S.M.A.R.T. attribute in various products include: Samsung, Seagate, IBM (Hitachi), Fujitsu, Maxtor, Toshiba, Western Digital and ExcelStor Technology.

Known ATA S.M.A.R.T. attributes

The following chart lists some S.M.A.R.T. attributes and the typical meaning of their raw values. Normalized values are always mapped so that higher values are better (with only very rare exceptions such as the "Temperature" attribute on certain Seagate drives [ [http://smartmontools.sourceforge.net/faq.html#temp-seagate smartmontools FAQ ("Attribute 194 (Temperature Celsius) behaves strangely on my Seagate disk")] ] ), but higher "raw" attribute values may be better or worse depending on the attribute and manufacturer. For example, the "Reallocated Sectors Count" attribute's normalized value "decreases" as the number of reallocated sectors "increases". In this case, the attribute's "raw" value will often indicate the actual number of sectors that were reallocated, although vendors are in no way required to adhere to this convention. As manufacturers do not necessarily agree on precise attribute definitions and measurement units, the following list of attributes should be regarded as a general guide only.

References

*cite web
author =
year =
url = http://www.siguardian.com/products/siguardian/on_line_help/s_m_a_r_t_attribute_meaning.html
title = S.M.A.R.T. attribute meaning
format =
work = PalickSoft
publisher =
accessmonthday = February 3
accessyear = 2006

*cite web
author = Zbigniew Chlondowski
url = http://smartlinux.sourceforge.net/smart/attributes.php
title = S.M.A.R.T. Site: attributes reference table
publisher = S.M.A.R.T. Linux
accessdate = January 17
accessyear = 2007

*cite web
work = Ariolic Software, Ltd
year = 2007
url = http://www.ariolic.com/activesmart/smart-attributes/
title = S.M.A.R.T. attributes meaning
format =
publisher =
accessmonthday = October 26
accessyear = 2007

*cite web
work = H.D.S. Hungary
year = 2007
url = http://www.hdsentinel.com/smart
title = Can we believe S.M.A.R.T. ? – How hard disk S.M.A.R.T. really works
format =
publisher =
accessmonthday = June 4
accessyear = 2008
Refend

External links

* [http://prefetch.net/articles/diskdrives.smart.html Out SMART Your Hard Drive]
* [http://www.pc-king.co.uk/tips3.htm How S.M.A.R.T. is your hard drive?]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Self-monitoring, analysis and reporting technology — Exemple d attributs S.M.A.R.T. Self Monitoring, Analysis, and Reporting Technology, ou S.M.A.R.T., (littéralement Technologie d Auto surveillance, d Analyse et de Rapport) est un système de surveillance du disque dur d un ordinateur. Il permet de …   Wikipédia en Français

  • Self-Monitoring Analysis and Reporting Technology — Self Monitoring Analysis and Reporting Technology,   SMART System …   Universal-Lexikon

  • Self-Monitoring, Analysis and Reporting Technology — Exemple d’attributs S.M.A.R.T. Self Monitoring, Analysis, and Reporting Technology, ou S.M.A.R.T. (littéralement Technique d’Auto surveillance, d’Analyse et de Rapport) est un système de surveillance du disque dur d’un ordinateur. Il permet de… …   Wikipédia en Français

  • Self-Monitoring Analysis and Reporting Technology — Das Self Monitoring, Analysis and Reporting Technology (SMART bzw. S.M.A.R.T.), zu deutsch System zur Selbstüberwachung, Analyse und Statusmeldung, ist ein Industriestandard, der in Computerfestplatten eingebaut wird. Es ermöglicht das permanente …   Deutsch Wikipedia

  • Self-Monitoring, Analysis and Reporting Technology — Das Self Monitoring, Analysis and Reporting Technology (SMART bzw. S.M.A.R.T.), zu deutsch System zur Selbstüberwachung, Analyse und Statusmeldung, ist ein Industriestandard, der in Computerfestplatten eingebaut wird. Es ermöglicht das permanente …   Deutsch Wikipedia

  • Self-Monitoring, Analysis, and Reporting Technology — S.M.A.R.T. (англ. Self Monitoring Analysing and Reporting Technology)  технология оценки состояния жёсткого диска встроенной аппаратурой самодиагностики, а также механизм предсказания времени выхода его из строя. Содержание 1 История 2 Описание 3 …   Википедия

  • Self — Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom. Sur les autres projets Wikimedia : « Self », sur le Wiktionnaire (dictionnaire universel) Le mot anglais self signifie la personnalité, l… …   Wikipédia en Français

  • Advanced Technology Attachment — ATA/ATAPI Stiftleiste (am Host bzw. am Peripheriegerät) 80 und 40 adrige ATA/ATAPI Kabel, zum Verbinden von Host zu Gerät ATA (Advanced Technology Attachment with Packet …   Deutsch Wikipedia

  • Advanced Technology Attachment Packet Interface — ATA/ATAPI Stiftleiste (am Host bzw. am Peripheriegerät) 80 und 40 adrige ATA/ATAPI Kabel, zum Verbinden von Host zu Gerät ATA (Advanced Technology Attachment with Packet …   Deutsch Wikipedia

  • Predictive Failure Analysis — (PFA) is a proprietary IBM technology for monitoring the likelihood of hard disk drives to fail. It was introduced in 1992 in IBM 0662 S1x drive (1052 MB Fast Wide SCSI 2 disk at 5400 rpm), and was industry s first such technology.The technology… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”