多项式回归

在统计学中， 多项式回归是回归分析的一种形式，其中自变量 x 和因变量 y 之间的关系被建模为关于 x 的 n 次多项式。多项式回归拟合x的值与 y 的相应条件均值之间的非线性关系，表示为 E(y|x)，并且已被用于描述非线性现象，例如组织的生长速率^[1]、湖中碳同位素的分布^[2]以及沉积物和流行病的发展^[3]。虽然多项式回归是拟合数据的非线性模型，但作为统计估计问题，它是线性的。在某种意义上，回归函数 E(y|x) 在从数据估计到的未知参数中是线性的。因此，多项式回归被认为是多元线性回归的特例。

由“基线”变量的多项式展开得到的解释性（独立）变量称为高次项，这些变量也用于分类场景。^[4]

历史

多项式回归模型通常使用最小二乘法来拟合。在高斯-马尔可夫定理的条件下，最小二乘法最小化系数的无偏估计的方差。最小二乘法由勒让德于1805年发表，1809年由高斯发表。多项式回归实验的第一个设计出现在1815年的 Gergonne 的论文中。^[5]^[6] 在二十世纪，多项式回归在回归分析的发展中起着重要作用，更加强调设计和推理的问题。^[7]

定义和实例

回归分析的目标是根据自变量（或自变量向量）x 的值来模拟因变量 y 的期望值。在简单的线性回归中，使用模型

y=\beta _{0}+\beta _{1}x+\varepsilon ,\,

其中ε是未观察到的随机误差，其以标量 x 为条件，均值为零。在该模型中，对于 x 值的每个单位增加，y 的条件期望增加 $\beta _{1}$ 个单位。

在许多情况下，这种线性关系可能不成立。例如，如果我们根据合成发生的温度对化学合成的产率进行建模，我们可以发现通过增加每单位温度增加的量来提高产率。在这种情况下，我们可能会提出如下所示的二次模型：

y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\varepsilon

。

在该模型中，当温度从x增加到x+1单位时，预期产量会变化 $\beta _{1}+\beta _{2}(2x+1).$ 。当 x 的变化趋近无穷小时，对 y 的影响由关于 x 的导数给出： $\beta _{1}+2\beta _{2}$ , 即使模型在待估计的参数中是线性的，但是产量的变化取决于 x 的事实使得 x 和 y 之间的关系为非线性关系。

通常，我们可以将 y 的期望值建模为 n 次多项式，得到一般多项式回归模型：

y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\beta _{3}x^{3}+\cdots +\beta _{n}x^{n}+\varepsilon

为了方便，这些模型从估计的角度来看都是线性的，因为回归函数就未知参数β₀、β₁等而言是线性的。因此，对于最小二乘分析，多项式回归的计算和推理问题可以使用多元回归技术完全解决，这是通过将 x、x² 等视为多元回归模型中的独特自变量来完成的。

矩阵形式和估计计算

多项式回归模型

y_{i}\,=\,\beta _{0}+\beta _{1}x_{i}+\beta _{2}x_{i}^{2}+\cdots +\beta _{m}x_{i}^{m}+\varepsilon _{i}\ (i=1,2,\dots ,n)

可以由设计矩阵 $\mathbf {X}$ 、响应矢量 ${\vec {y}}$ 、矢量参数 ${\vec {\beta }}$ 和随机误差向量 ${\vec {\varepsilon }}$ 来表示。在第 i 行的 $\mathbf {X}$ 和 ${\vec {y}}$ 为第 i 个数据样本的 x 和 y 值。然后该模型可以写成线性方程组：

{\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\\\vdots \\y_{n}\end{bmatrix}}={\begin{bmatrix}1&x_{1}&x_{1}^{2}&\dots &x_{1}^{m}\\1&x_{2}&x_{2}^{2}&\dots &x_{2}^{m}\\1&x_{3}&x_{3}^{2}&\dots &x_{3}^{m}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n}&x_{n}^{2}&\dots &x_{n}^{m}\end{bmatrix}}{\begin{bmatrix}\beta _{0}\\\beta _{1}\\\beta _{2}\\\vdots \\\beta _{m}\end{bmatrix}}+{\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\varepsilon _{3}\\\vdots \\\varepsilon _{n}\end{bmatrix}},

当使用纯矩阵表示法时，将其写为

{\vec {y}}=\mathbf {X} {\vec {\beta }}+{\vec {\varepsilon }}

估计多项式回归系数的向量（使用最小二乘估计）为

{\widehat {\vec {\beta }}}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )^{-1}\;\mathbf {X} ^{\mathsf {T}}{\vec {y}},\,

假设 m<n 是矩阵可逆的必要条件，那么由于 $\mathbf {X}$ 是范德蒙矩阵，如果所有 $x_{i}$ 值都不同，则保证可逆性条件成立，这是唯一的最小二乘法的解。

解释

虽然多项式回归在技术上是多元线性回归的一个特例，但拟合多项式回归模型的解释需要一个不同的视角。通常难以在多项式回归拟合中解释各个系数，因为基础单项式可以高度相关。例如，当 x 具有在区间 (0,1) 上的均匀分布时，x 和 x² 具有大约0.97的相关性。尽管可以通过使用正交多项式来减少相关性，但是通常将拟合的回归函数视为整体来提供更多信息，然后可以使用逐点或同时置信区间来提供回归函数估计的不确定性。

替代方法

多项式回归是使用基函数来模拟两个变量之间的函数关系的回归分析的一个实例。更具体地说，它用多项式基 $\varphi (x)\in \mathbb {R} ^{d_{\varphi }}$ 替换线性回归中的 $x\in \mathbb {R} ^{d_{x}}$ ，如 $[1,x]{\mathbin {\stackrel {\varphi }{\rightarrow }}}[1,x,x^{2},\ldots ,x^{d}]$ 。多项式基的一个缺点在于基函数是“非局部的”，这意味在给定值x = x₀ 处 y 的拟合值很大程度依赖于 x 远离 x₀的数据值^[8]。在现代统计中，多项式基函数与新的基函数一起使用，如样条函数、径向基函数和小波。这些基函数族为许多类型的数据提供了更简约的拟合。

多项式回归的目标是模拟独立变量和因变量之间的非线性关系（在自变量和因变量的条件均值之间）。这与非参数回归的目标类似，非参数回归旨在捕获非线性回归关系。因此，诸如平滑的非参数回归方法可以是多项式回归的有效替代方案。这些方法中的一些利用了经典多项式回归的局部形式。^[9] 传统多项式回归的一个优点是可以使用多元回归的推理框架（当使用其他基函数族，如样条函数时也是如此）。

另外一种方法是使用核方法模型，如支持向量回归和多项式核。

参见

曲线拟合
线性回归
局部多项式回归（英语：Local polynomial regression）
多项式和有理函数建模（英语：Polynomial and rational function modeling）
多项式差值
反应曲面法
平滑样条曲线（英语：Smoothing spline）

参考文献

^ Shaw, P; et al. Intellectual ability and cortical development in children and adolescents. Nature. 2006, 440 (7084): 676–679. PMID 16572172. doi:10.1038/nature04513.
^ Barker, PA; Street-Perrott, FA; Leng, MJ; Greenwood, PB; Swain, DL; Perrott, RA; Telford, RJ; Ficken, KJ. A 14,000-Year Oxygen Isotope Record from Diatom Silica in Two Alpine Lakes on Mt. Kenya. Science. 2001, 292 (5525): 2307–2310. PMID 11423656. doi:10.1126/science.1059612.
^ Greenland, Sander. Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis. Epidemiology. 1995, 6 (4): 356–365. JSTOR 3702080. PMID 7548341. doi:10.1097/00001648-199507000-00005.
^ Yin-Wen Chang; Cho-Jui Hsieh; Kai-Wei Chang; Michael Ringgaard; Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research. 2010, 11: 1471–1490 [2019-03-18]. （原始内容存档于2020-11-21）.
^ Gergonne, J. D. The application of the method of least squares to the interpolation of sequences. Historia Mathematica Translated by Ralph St. John and S. M. Stigler from the 1815 French. November 1974, 1 (4): 439–447 [1815]. doi:10.1016/0315-0860(74)90034-2.
^ Stigler, Stephen M. Gergonne's 1815 paper on the design and analysis of polynomial regression experiments. Historia Mathematica. November 1974, 1 (4): 431–439. doi:10.1016/0315-0860(74)90033-0.
^ Smith, Kirstine. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations. Biometrika. 1918, 12 (1/2): 1–85. JSTOR 2331929. doi:10.2307/2331929.
^
Such "non-local" behavior is a property of analytic functions that are not constant (everywhere). Such "non-local" behavior has been widely discussed in statistics:
- Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.
^ Fan, Jianqing. Local Polynomial Modelling and Its Applications: From linear regression to nonlinear regression. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. 1996. ISBN 978-0-412-98321-4.

[1] Shaw, P; et al. Intellectual ability and cortical development in children and adolescents. Nature. 2006, 440 (7084): 676–679. PMID 16572172. doi:10.1038/nature04513.

[2] Barker, PA; Street-Perrott, FA; Leng, MJ; Greenwood, PB; Swain, DL; Perrott, RA; Telford, RJ; Ficken, KJ. A 14,000-Year Oxygen Isotope Record from Diatom Silica in Two Alpine Lakes on Mt. Kenya. Science. 2001, 292 (5525): 2307–2310. PMID 11423656. doi:10.1126/science.1059612.

[3] Greenland, Sander. Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis. Epidemiology. 1995, 6 (4): 356–365. JSTOR 3702080. PMID 7548341. doi:10.1097/00001648-199507000-00005.

[Chang2010-4] Yin-Wen Chang; Cho-Jui Hsieh; Kai-Wei Chang; Michael Ringgaard; Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research. 2010, 11: 1471–1490 [2019-03-18]. （原始内容存档于2020-11-21）.

[5] Gergonne, J. D. The application of the method of least squares to the interpolation of sequences. Historia Mathematica Translated by Ralph St. John and S. M. Stigler from the 1815 French. November 1974, 1 (4): 439–447 [1815]. doi:10.1016/0315-0860(74)90034-2.

[6] Stigler, Stephen M. Gergonne's 1815 paper on the design and analysis of polynomial regression experiments. Historia Mathematica. November 1974, 1 (4): 431–439. doi:10.1016/0315-0860(74)90033-0.

[7] Smith, Kirstine. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations. Biometrika. 1918, 12 (1/2): 1–85. JSTOR 2331929. doi:10.2307/2331929.

[8] Such "non-local" behavior is a property of analytic functions that are not constant (everywhere). Such "non-local" behavior has been widely discussed in statistics:
Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.

[9] Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.

[9] Fan, Jianqing. Local Polynomial Modelling and Its Applications: From linear regression to nonlinear regression. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. 1996. ISBN 978-0-412-98321-4.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]