双精度浮点数

双精度浮点数（英语：Double-precision floating-point）是计算机使用的一种资料类型。比起单精度浮点数仅有 32 比特（4字节），双精度浮点数使用 64 比特（8字节）来存储一个浮点数^[1]。它可以表示二进位制的53位有效数字，其可以表示的数字的绝对值范围为 $[2^{-1024},2^{1024}]$ 。

格式

sign bit（符号）：用来表示正负号
exponent（指数）：用来表示次方数
mantissa（尾数）：用来表示精确度

符号

0代表数值为正，1代表数值为负。

指数

共有11个比特，使用“偏移表示法（英语：Exponent bias）”，有2个例外分别为

“11个比特皆为0”
“11个比特皆为1”

并且以1023为偏移标准，表示实际指数为0，因此指数范围为 -1022 到 +1023：

指数 000₁₆ 和 7ff₁₆ 具有特殊意义：

00000000000₂ = 000₁₆当尾数为0时为±0，尾数不为0时为非正规形式的浮点数。

11111111111₂ = 7ff₁₆当尾数为0时为∞，尾数不为0时为NaN。

尾数

在二进制的“科学记号”，数字被表示为：

${\text{1.mantissa}}\times {\text{2}}^{\text{exponent}}$

二进制的“科学记号”（a×2ⁿ）的a的范围是大于等于1而小于2，例如：

二进位制的 ${\text{11.101}}\times {\text{2}}^{\text{1001}}$ 可以规范化为 ${\text{1.1101}}\times {\text{2}}^{\text{1010}}$ ，存储时尾数只需要存储1101即可。
二进位制的 ${\text{0.00110011}}\times {\text{2}}^{-1001}$ 可以规范化为 ${\text{1.10011}}\times {\text{2}}^{-1100}$ ，存储时尾数只需要存储10011即可。

小结

根据以上的叙述，一个双精度浮点数所代表的数值为：

$(-1)^{\text{sign}}\times 2^{\text{exponent}}\times 1.{\text{mantissa}}$

例子

0 01111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 3FF0 0000 0000 0000₁₆ ≙ +2⁰ × 1 = 1

0 01111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 3FF0 0000 0000 0001₁₆ ≙ +2⁰ × (1 + 2⁻⁵²) ≈ 1.0000000000000002, the smallest number > 1

0 01111111111 0000000000000000000000000000000000000000000000000010₂ ≙ 3FF0 0000 0000 0002₁₆ ≙ +2⁰ × (1 + 2⁻⁵¹) ≈ 1.0000000000000004

0 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 4000 0000 0000 0000₁₆ ≙ +2¹ × 1 = 2

1 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ C000 0000 0000 0000₁₆ ≙ −2¹ × 1 = −2

0 10000000000 1000000000000000000000000000000000000000000000000000₂ ≙ 4008 0000 0000 0000₁₆ ≙ +2¹ × 1.1₂ = 11₂ = 3

0 10000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 4010 0000 0000 0000₁₆ ≙ +2² × 1 = 100₂ = 4

0 10000000001 0100000000000000000000000000000000000000000000000000₂ ≙ 4014 0000 0000 0000₁₆ ≙ +2² × 1.01₂ = 101₂ = 5

0 10000000001 1000000000000000000000000000000000000000000000000000₂ ≙ 4018 0000 0000 0000₁₆ ≙ +2² × 1.1₂ = 110₂ = 6

0 10000000011 0111000000000000000000000000000000000000000000000000₂ ≙ 4037 0000 0000 0000₁₆ ≙ +2⁴ × 1.0111₂ = 10111₂ = 23

0 01111111000 1000000000000000000000000000000000000000000000000000₂ ≙ 3F88 0000 0000 0000₁₆ ≙ +2⁻⁷ × 1.1₂ = 0.00000011₂ = 0.01171875 (3/256)

0 00000000000 0000000000000000000000000000000000000000000000000001₂ ≙ 0000 0000 0000 0001₁₆ ≙ +2⁻¹⁰²² × 2⁻⁵² = 2⁻¹⁰⁷⁴
≈ 4.9406564584124654 × 10⁻³²⁴ (Min. subnormal positive double)

0 00000000000 1111111111111111111111111111111111111111111111111111₂ ≙ 000F FFFF FFFF FFFF₁₆ ≙ +2⁻¹⁰²² × (1 − 2⁻⁵²)
≈ 2.2250738585072009 × 10⁻³⁰⁸ (Max. subnormal double)

0 00000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 0010 0000 0000 0000₁₆ ≙ +2⁻¹⁰²² × 1
≈ 2.2250738585072014 × 10⁻³⁰⁸ (Min. normal positive double)

0 11111111110 1111111111111111111111111111111111111111111111111111₂ ≙ 7FEF FFFF FFFF FFFF₁₆ ≙ +2¹⁰²³ × (1 + (1 − 2⁻⁵²))
≈ 1.7976931348623157 × 10³⁰⁸ (Max. Double)

0 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 0000 0000 0000 0000₁₆ ≙ +0

1 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 8000 0000 0000 0000₁₆ ≙ −0

0 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 7FF0 0000 0000 0000₁₆ ≙ +∞ (positive infinity)

1 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ FFF0 0000 0000 0000₁₆ ≙ −∞ (negative infinity)

0 11111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 7FF0 0000 0000 0001₁₆ ≙ NaN (sNaN on most processors, such as x86 and ARM)

0 11111111111 1000000000000000000000000000000000000000000000000001₂ ≙ 7FF8 0000 0000 0001₁₆ ≙ NaN (qNaN on most processors, such as x86 and ARM)

0 11111111111 1111111111111111111111111111111111111111111111111111₂ ≙ 7FFF FFFF FFFF FFFF₁₆ ≙ NaN (an alternative encoding of NaN)

0 01111111101 0101010101010101010101010101010101010101010101010101₂
= 3fd5 5555 5555 5555₁₆ ≙ +2⁻² × (1 + 2⁻² + 2⁻⁴ + ... + 2⁻⁵²)
≈ ¹/₃

0 10000000000 1001001000011111101101010100010001000010110100011000₂
= 4009 21fb 5444 2d18₁₆ ≈ pi

参考文献

^ Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰信息. 2020: 第33页. ISBN 978-986-502-172-6.

参阅

[1] Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰信息. 2020: 第33页. ISBN 978-986-502-172-6.

[1]

查论编数据类型
无解释的	比特字节三进制位三进制字节字
数值	整数符号性有符号数无符号数定点数浮点数双精度扩展精度（英语：Extended precision）半精度迷你浮点数（英语：Minifloat）八倍精度（英语：Octuple-precision floating-point format）四倍精度（英语：Quadruple-precision floating-point format）单精度有理数（英语：Rational data type）复数（英语：Complex data type）任意精度算术区间（英语：interval arithmetic）
文本	字符字符串
指针	存储器地址物理地址虚拟地址引用
组合	代数数据类型广义（英语：generalized algebraic data type）数组关联数组类串列对象元对象可选类型积类型（英语：Product type）记录集合联合体标签
其他	布尔型底层类别（英语：Bottom type）容器枚举类型异常头等函数不透明数据类型（英语：Opaque data type）递归数据类型信号标字符串流顶类型（英语：Top type）类型类类型系统单位类型（英语：Unit type） Void 不定类型
相关议题	抽象资料类型数据结构接口种类（英语：Kind (type theory)）元类对象类型（英语：Boxing (computer programming)）原始类型与复合类型协议子类型 C++模板类型构造器参数多态