IEEE 754-2008

IEEE 754-2008

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many hardware (CPU and FPU) and software implementations. Many computer languages allow or require that some or all arithmetic be carried out using IEEE 754 formats and operations.

The standard defines
* "arithmetic formats:" sets of binary and decimal floating-point data, which consist of finite numbers, (including negative zero and subnormal numbers), infinities, and special 'not a number' values (NaNs)
* "interchange formats:" encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form
* "rounding algorithms:" methods to be used for rounding numbers during arithmetic and conversions
* "operations:" arithmetic and other operations on arithmetic formats
* "exception handling:" indications of exceptional conditions (such as division by zero, overflow, "etc.")

The standard also includes extensive recommendations for advanced exception handling, additional operations (such as trigonometic functions), expression evaluation, and for achieving reproducible results.

The standard is derived from and replaces the earlier IEEE Standard for Binary Floating-Point Arithmetic following a 7-year revision process. The binary formats in that standard are included in the new standard along with three new basic formats (one binary and two decimal). To conform to the standard, an implementation must implement at least one of the basic formats as both an arithmetic format and an interchange format.

Formats

"Formats" in IEEE 754 describe sets of floating-point data and encodings for interchanging them.

A given format comprises:

* Finite numbers, which may be either base 2 (binary) or base 10 (decimal). Each finite number is most simply described by three integers: a "sign" (zero or one), "s", a "significand" (or 'coefficient'), "c", and an "exponent", "q". The numerical value of a finite number is
(−1)"s" × "c" × "b""q"
where "b" is the base (2 or 10). For example, if the sign is 1 (indicating negative), the significand is 12345, the exponent is −3, and the base is 10, then the value of the number is −12.345.

* Two infinities: +infty and -infty.

* Two kinds of NaN (quiet and signalling). A NaN may also carry a "payload", intended for diagnostic infomation indicating the source of the NaN. The sign of a NaN has no meaning, but it may be predictable in some circumstances.

The possible finite values that can be represented in a given format are determined by the base ("b"), the number of digits in the significand (precision, "p"), and the exponent parameter "emax":
* "c" must be an integer in the range zero through "b""p"−1 ("e.g.", if "b"=10 and "p"=7 then c is 0 through 9999999)
* "q" must be an integer such that 1−"emax" ≤ "q"+"p"−1 ≤ "emax" ("e.g.", if "p"=7 and "emax"=96 then q is −101 through 90).

Hence (for the example parameters) the smallest non-zero positive number that can be represented is 1×10−101 and the largest is 9999999×1090 (9.999999×1096), and the full range of numbers is −9.999999×1096 through 9.999999×1096. The numbers closest to the inverse of these bounds (−1×10−95 and 1×10−95) are considered to be the smallest (in magnitude) "normal numbers"; non-zero numbers between these smallest numbers are called subnormal numbers.

Basic formats

The standard defines five basic formats, named using their base and the number of bits used to encode them. There are three binary floating-point formats (which can be encoded using 32, 64, or 128 bits) and two decimal floating-point formats (which can be encoded using 64 or 128 bits). The first two binary formats are the 'single' and 'double' formats of IEEE 754-1985, and the third is often called 'quad'; the decimal formats are similarly often called 'double' and 'quad'.

All the basic formats are used in hardware and software implementations.

Arithmetic formats

A format that is just to be used for arithmetic and other operations need not have an encoding associated with it (that is, an implementation can use whatever internal representation it chooses); all that needs to be defined are its parameters ("b", "p", and "emax"). These parameters uniquely describe the set of finite numbers (combinations of sign, significand, and exponent) that it can represent.

Interchange formats

Interchange formats are intended for the exchange of floating-point data using a fixed-length bit-string for a given format.

For the exchange of binary floating-point numbers, interchange formats of length 16 bits, 32 bits, 64 bits, and any multiple of 32 bits ≥ 128 are defined. The 16-bit format is intended for the exchange or storage of small numbers ("e.g.", for graphics).

The encoding scheme for these binary interchange formats is the same as that of IEEE 754-1985 (a sign bit, followed by exponent bits that describe the exponent offset by a "bias", and "p"−1 bits that describe the significand). There is some flexibility in the encoding of signaling NaNs.

For the exchange of decimal floating-point numbers, interchange formats of any multiple of 32 bits ≥ 32 are defined.

The encoding scheme for the decimal interchange formats similarly encodes the sign, exponent, and significand, but uses a more complex approach to allow the significand to be encoded as a compressed sequence of decimal digits (using Densely Packed Decimal) or as a binary integer. In either case the set of numbers (combinations of sign, significand, and exponent) that may be encoded is identical, and signaling NaNs have a unique encoding (and the same set of possible payloads).

Rounding algorithms

The standard defines five rounding algorithms. The first two round to a nearest value; the others are called "directed roundings":

* Round to nearest, ties to even – rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit, which occurs 50% of the time; this is the default algorithm for binary floating-point and the recommended default for decimal
* Round to nearest, ties away from zero – rounds to the nearest value; if the number falls midway it is rounded to the nearest value above (for positive numbers) or below (for negative numbers)
* Round toward 0 – directed rounding towards zero (also called truncation)
* Round toward +infty – directed rounding towards positive infinity
* Round toward -infty – directed rounding towards negative infinity.

Operations

Required operations for a supported arithmetic format (including the basic formats) include:
* Arithmetic operations (add, subtract, multiply, divide, square root, fused-multiply-add, remainder, "etc.")
* Conversions (between formats, to and from strings, "etc.")
* Scaling and (for decimal) quantizing
* Copying and manipulating the sign (abs, negate, "etc.")
* Comparisons and total ordering
* Classification and testing for NaNs, "etc."
* Testing and setting flags
* Miscellaneous operations.

Exception handling

The standard defines five exceptions, each of which has a corresponding status flag that (except in certain cases of underflow) is raised when the exception occurs. No other action is required, but alternatives are recommended (see below).

The five possible exceptions are:
* Invalid operation ("e.g.", square root of a negative number)
* Division by zero
* Overflow (a result is too large to be represented correctly)
* Underflow (a result is very small (outside the normal range), non-zero, and inexact)
* Inexact.

These are the same five exceptions as were defined in 754-1985.

Recommendations

Alternate exception handling

The standard recommends optional exception handling in various forms, including traps (exceptions that change the flow of control in some way) and other exception handling models which interrupt the flow, such as try/catch. The traps and other exception mechanisms remain optional, as they were in IEEE 754-1985.

Recommended operations

A new clause in the standard recommends fifty operations, including log, power, and trigonometric functions, that language standards should define. These are all optional (none are required in order to conform to the standard). The operations include some on dynamic modes for attributes, and also a set of reduction operations (sum, scaled product, "etc."). All are required to supply a correctly rounded result, but they do not have to detect or report inexactness.

Expression evaluation

The standard recommends how language standards should specify the semantics of sequences of operations, and points out the subtleties of literal meanings and optimizations that change the value of a result.

Reproducibility

The IEEE 754-1985 allowed many variations in implementations (such as the encoding of some values and the detection of certain exceptions). IEEE 754-2008 has tightened up many of these, but a few variations still remain (especially for binary formats). The reproducibility clause recommends that language standards should provide a means to write reproducible programs ("i.e.", programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.

ee also

* IEEE 754-1985
* −0 (negative zero)
* Intel 8087 (an early implementation of binary floating-point)
* minifloat for simple examples of properties of IEEE 754 binary floating-point numbers

References

Further reading

*cite journal
author = Charles Severance
title = IEEE 754: An Interview with William Kahan
journal = IEEE Computer
year = 1998
month = March
volume = 31
issue = 3
pages = 114–115
doi = 10.1109/MC.1998.660194
url = http://www.freecollab.com/dr-chuck/papers/columns/r3114.pdf
accessdate = 2008-04-28
quote =

* cite journal
author = David Goldberg
title = What Every Computer Scientist Should Know About Floating-Point Arithmetic
journal = ACM Computing Surveys (CSUR)
year = 1991
month = March
volume = 23
issue = 1
pages = 5–48
doi = 10.1145/103162.103163
url = http://www.validlab.com/goldberg/paper.pdf
accessdate = 2008-04-28
quote =

* cite journal
author = Chris Hecker
title = Let's Get To The (Floating) Point
journal = Game Developer Magazine
year = 1996
month = February
pages = 19–24
issn = 1073-922X
url = http://www.d6.com/users/checker/pdfs/gdmfp.pdf
accessdate =
quote =

External links

* [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935 IEEE 754-2008 Standard for Floating-Point Arithmetic]
* [http://hal.archives-ouvertes.fr/hal-00128124 A compendium of non-intuitive behaviours of binary floating-point on popular architectures] , with implications for program verification and testing
* [http://speleotrove/decimal Decimal floating-point] arithmetic, FAQs, bibliograpy, and links
* [http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm Comparing binary floats]
* [http://www.coprocessor.info/ Coprocessor.info: x87 FPU pictures, development and manufacturer information]
* [http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html IEEE 754 references]
* [http://speleotrove/decimal/854mins.html IEEE 854-1987] — History and minutes
* [http://babbage.cs.qc.edu/IEEE-754/ Online IEEE 754 binary calculators]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • IEEE 754-2008 — Der Standard IEEE 754 2008, der früherer Arbeitstitel lautete IEEE 754r, ist eine notwendig gewordene Revision des 1985 verabschiedeten Gleitkommastandards IEEE 754. Der alte Standard war sehr erfolgreich und wurde in zahlreichen Prozessoren und… …   Deutsch Wikipedia

  • IEEE 754-2008 — IEEE 754 широко распространённый стандарт формата представления чисел с плавающей точкой, используемый как в программных реализациях арифметических действий, так и во многих аппаратных (CPU и FPU) реализациях. Многие компиляторы языков… …   Википедия

  • IEEE-754 — L’IEEE 754 est un standard pour la représentation des nombres à virgule flottante en binaire. Il est le plus employé actuellement pour le calcul des nombres à virgule flottante dans le domaine informatique, avec les CPU et les FPU. Le standard… …   Wikipédia en Français

  • Ieee 754 — L’IEEE 754 est un standard pour la représentation des nombres à virgule flottante en binaire. Il est le plus employé actuellement pour le calcul des nombres à virgule flottante dans le domaine informatique, avec les CPU et les FPU. Le standard… …   Wikipédia en Français

  • IEEE 754 — Die Norm IEEE 754 (ANSI/IEEE Std 754 1985; IEC 60559:1989 International version) definiert Standarddarstellungen für binäre Gleitkommazahlen in Computern und legt genaue Verfahren für die Durchführung mathematischer Operationen, insbesondere für… …   Deutsch Wikipedia

  • IEEE 754 revision — This article describes the revision process of the IEEE 754 standard, 2000 2008, and the changes included in the revision. For a description of the standard itself, see IEEE 754 2008. IEEE 754 2008 (previously known as IEEE 754r ) was published… …   Wikipedia

  • IEEE 754 — L’IEEE 754 est un standard pour la représentation des nombres à virgule flottante en binaire. Il est le plus employé actuellement pour le calcul des nombres à virgule flottante dans le domaine informatique, avec les CPU et les FPU. Le standard… …   Wikipédia en Français

  • IEEE 754-1985 — The IEEE Standard for Binary Floating Point Arithmetic (IEEE 754) is the most widely used standard for floating point computation, and is followed by many CPU and FPU implementations. The standard defines formats for representing floating point… …   Wikipedia

  • IEEE 854 — Die Norm IEEE 854 (ANSI/IEEE Std 854 1987) definiert Standarddarstellungen für basis unabhängige Gleitkommazahlen in Computern und legt genaue Verfahren für die Durchführung mathematischer Operationen, insbesondere für Rundungen, fest. Als Basis… …   Deutsch Wikipedia

  • IEEE 754r — ist eine notwendig gewordene Revision des vor etwa 20 Jahren (1985) verabschiedeten Gleitkommastandards IEEE 754. Der alte Standard war sehr erfolgreich und wurde in zahlreichen Prozessoren und Programmiersprachen übernommen. Die Diskussion über… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”