多項式回歸

在統計學中， 多項式回歸是回歸分析的一種形式，其中自變量 x 和因變量 y 之間的關係被建模為關於 x 的 n 次多項式。多項式回歸擬合x的值與 y 的相應條件均值之間的非線性關係，表示為 E(y|x)，並且已被用於描述非線性現象，例如組織的生長速率^[1]、湖中碳同位素的分布^[2]以及沉積物和流行病的發展^[3]。雖然多項式回歸是擬合數據的非線性模型，但作為統計估計問題，它是線性的。在某種意義上，回歸函數 E(y|x) 在從數據估計到的未知參數中是線性的。因此，多項式回歸被認為是多元線性回歸的特例。

由「基線」變量的多項式展開得到的解釋性（獨立）變量稱為高次項，這些變量也用於分類場景。^[4]

歷史

多項式回歸模型通常使用最小二乘法來擬合。在高斯-馬爾可夫定理的條件下，最小二乘法最小化係數的無偏估計的方差。最小二乘法由勒讓德於1805年發表，1809年由高斯發表。多項式回歸實驗的第一個設計出現在1815年的 Gergonne 的論文中。^[5]^[6] 在二十世紀，多項式回歸在回歸分析的發展中起著重要作用，更加強調設計和推理的問題。^[7]

定義和實例

回歸分析的目標是根據自變量（或自變量向量）x 的值來模擬因變量 y 的期望值。在簡單的線性回歸中，使用模型

y=\beta _{0}+\beta _{1}x+\varepsilon ,\,

其中ε是未觀察到的隨機誤差，其以純量 x 為條件，均值為零。在該模型中，對於 x 值的每個單位增加，y 的條件期望增加 $\beta _{1}$ 個單位。

在許多情況下，這種線性關係可能不成立。例如，如果我們根據合成發生的溫度對化學合成的產率進行建模，我們可以發現通過增加每單位溫度增加的量來提高產率。在這種情況下，我們可能會提出如下所示的二次模型：

y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\varepsilon

。

在該模型中，當溫度從x增加到x+1單位時，預期產量會變化 $\beta _{1}+\beta _{2}(2x+1).$ 。當 x 的變化趨近無窮小時，對 y 的影響由關於 x 的導數給出： $\beta _{1}+2\beta _{2}$ , 即使模型在待估計的參數中是線性的，但是產量的變化取決於 x 的事實使得 x 和 y 之間的關係為非線性關係。

通常，我們可以將 y 的期望值建模為 n 次多項式，得到一般多項式回歸模型：

y=\beta _{0}+\beta _{1}x+\beta _{2}x^{2}+\beta _{3}x^{3}+\cdots +\beta _{n}x^{n}+\varepsilon

為了方便，這些模型從估計的角度來看都是線性的，因為回歸函數就未知參數β₀、β₁等而言是線性的。因此，對於最小二乘分析，多項式回歸的計算和推理問題可以使用多元回歸技術完全解決，這是通過將 x、x² 等視為多元回歸模型中的獨特自變量來完成的。

矩陣形式和估計計算

多項式回歸模型

y_{i}\,=\,\beta _{0}+\beta _{1}x_{i}+\beta _{2}x_{i}^{2}+\cdots +\beta _{m}x_{i}^{m}+\varepsilon _{i}\ (i=1,2,\dots ,n)

可以由設計矩陣 $\mathbf {X}$ 、響應矢量 ${\vec {y}}$ 、矢量參數 ${\vec {\beta }}$ 和隨機誤差向量 ${\vec {\varepsilon }}$ 來表示。在第 i 行的 $\mathbf {X}$ 和 ${\vec {y}}$ 為第 i 個數據樣本的 x 和 y 值。然後該模型可以寫成線性方程組：

{\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\\\vdots \\y_{n}\end{bmatrix}}={\begin{bmatrix}1&x_{1}&x_{1}^{2}&\dots &x_{1}^{m}\\1&x_{2}&x_{2}^{2}&\dots &x_{2}^{m}\\1&x_{3}&x_{3}^{2}&\dots &x_{3}^{m}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n}&x_{n}^{2}&\dots &x_{n}^{m}\end{bmatrix}}{\begin{bmatrix}\beta _{0}\\\beta _{1}\\\beta _{2}\\\vdots \\\beta _{m}\end{bmatrix}}+{\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\varepsilon _{3}\\\vdots \\\varepsilon _{n}\end{bmatrix}},

當使用純矩陣表示法時，將其寫為

{\vec {y}}=\mathbf {X} {\vec {\beta }}+{\vec {\varepsilon }}

估計多項式回歸係數的向量（使用最小二乘估計）為

{\widehat {\vec {\beta }}}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )^{-1}\;\mathbf {X} ^{\mathsf {T}}{\vec {y}},\,

假設 m<n 是矩陣可逆的必要條件，那麼由於 $\mathbf {X}$ 是范德蒙矩陣，如果所有 $x_{i}$ 值都不同，則保證可逆性條件成立，這是唯一的最小二乘法的解。

解釋

雖然多項式回歸在技術上是多元線性回歸的一個特例，但擬合多項式回歸模型的解釋需要一個不同的視角。通常難以在多項式回歸擬合中解釋各個係數，因為基礎單項式可以高度相關。例如，當 x 具有在區間 (0,1) 上的均勻分布時，x 和 x² 具有大約0.97的相關性。儘管可以通過使用正交多項式來減少相關性，但是通常將擬合的回歸函數視為整體來提供更多信息，然後可以使用逐點或同時置信區間來提供回歸函數估計的不確定性。

替代方法

多項式回歸是使用基函數來模擬兩個變量之間的函數關係的回歸分析的一個實例。更具體地說，它用多項式基 $\varphi (x)\in \mathbb {R} ^{d_{\varphi }}$ 替換線性回歸中的 $x\in \mathbb {R} ^{d_{x}}$ ，如 $[1,x]{\mathbin {\stackrel {\varphi }{\rightarrow }}}[1,x,x^{2},\ldots ,x^{d}]$ 。多項式基的一個缺點在於基函數是「非局部的」，這意味在給定值x = x₀ 處 y 的擬合值很大程度依賴於 x 遠離 x₀的數據值^[8]。在現代統計中，多項式基函數與新的基函數一起使用，如樣條函數、徑向基函數和小波。這些基函數族為許多類型的數據提供了更簡約的擬合。

多項式回歸的目標是模擬獨立變量和因變量之間的非線性關係（在自變量和因變量的條件均值之間）。這與非參數回歸的目標類似，非參數回歸旨在捕獲非線性回歸關係。因此，諸如平滑的非參數回歸方法可以是多項式回歸的有效替代方案。這些方法中的一些利用了經典多項式回歸的局部形式。^[9] 傳統多項式回歸的一個優點是可以使用多元回歸的推理框架（當使用其他基函數族，如樣條函數時也是如此）。

另外一種方法是使用核方法模型，如支持向量回歸和多項式核。

參見

曲線擬合
線性回歸
局部多項式回歸（英語：Local polynomial regression）
多項式和有理函數建模（英語：Polynomial and rational function modeling）
多項式差值
反應曲面法
平滑樣條曲線（英語：Smoothing spline）

參考文獻

^ Shaw, P; et al. Intellectual ability and cortical development in children and adolescents. Nature. 2006, 440 (7084): 676–679. PMID 16572172. doi:10.1038/nature04513.
^ Barker, PA; Street-Perrott, FA; Leng, MJ; Greenwood, PB; Swain, DL; Perrott, RA; Telford, RJ; Ficken, KJ. A 14,000-Year Oxygen Isotope Record from Diatom Silica in Two Alpine Lakes on Mt. Kenya. Science. 2001, 292 (5525): 2307–2310. PMID 11423656. doi:10.1126/science.1059612.
^ Greenland, Sander. Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis. Epidemiology. 1995, 6 (4): 356–365. JSTOR 3702080. PMID 7548341. doi:10.1097/00001648-199507000-00005.
^ Yin-Wen Chang; Cho-Jui Hsieh; Kai-Wei Chang; Michael Ringgaard; Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research. 2010, 11: 1471–1490 [2019-03-18]. （原始內容存檔於2020-11-21）.
^ Gergonne, J. D. The application of the method of least squares to the interpolation of sequences. Historia Mathematica Translated by Ralph St. John and S. M. Stigler from the 1815 French. November 1974, 1 (4): 439–447 [1815]. doi:10.1016/0315-0860(74)90034-2.
^ Stigler, Stephen M. Gergonne's 1815 paper on the design and analysis of polynomial regression experiments. Historia Mathematica. November 1974, 1 (4): 431–439. doi:10.1016/0315-0860(74)90033-0.
^ Smith, Kirstine. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations. Biometrika. 1918, 12 (1/2): 1–85. JSTOR 2331929. doi:10.2307/2331929.
^
Such "non-local" behavior is a property of analytic functions that are not constant (everywhere). Such "non-local" behavior has been widely discussed in statistics:
- Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.
^ Fan, Jianqing. Local Polynomial Modelling and Its Applications: From linear regression to nonlinear regression. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. 1996. ISBN 978-0-412-98321-4.

[1] Shaw, P; et al. Intellectual ability and cortical development in children and adolescents. Nature. 2006, 440 (7084): 676–679. PMID 16572172. doi:10.1038/nature04513.

[2] Barker, PA; Street-Perrott, FA; Leng, MJ; Greenwood, PB; Swain, DL; Perrott, RA; Telford, RJ; Ficken, KJ. A 14,000-Year Oxygen Isotope Record from Diatom Silica in Two Alpine Lakes on Mt. Kenya. Science. 2001, 292 (5525): 2307–2310. PMID 11423656. doi:10.1126/science.1059612.

[3] Greenland, Sander. Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis. Epidemiology. 1995, 6 (4): 356–365. JSTOR 3702080. PMID 7548341. doi:10.1097/00001648-199507000-00005.

[Chang2010-4] Yin-Wen Chang; Cho-Jui Hsieh; Kai-Wei Chang; Michael Ringgaard; Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research. 2010, 11: 1471–1490 [2019-03-18]. （原始內容存檔於2020-11-21）.

[5] Gergonne, J. D. The application of the method of least squares to the interpolation of sequences. Historia Mathematica Translated by Ralph St. John and S. M. Stigler from the 1815 French. November 1974, 1 (4): 439–447 [1815]. doi:10.1016/0315-0860(74)90034-2.

[6] Stigler, Stephen M. Gergonne's 1815 paper on the design and analysis of polynomial regression experiments. Historia Mathematica. November 1974, 1 (4): 431–439. doi:10.1016/0315-0860(74)90033-0.

[7] Smith, Kirstine. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations. Biometrika. 1918, 12 (1/2): 1–85. JSTOR 2331929. doi:10.2307/2331929.

[8] Such "non-local" behavior is a property of analytic functions that are not constant (everywhere). Such "non-local" behavior has been widely discussed in statistics:
Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.

[9] Magee, Lonnie. Nonlocal Behavior in Polynomial Regressions. The American Statistician. 1998, 52 (1): 20–22. JSTOR 2685560. doi:10.2307/2685560.

[9] Fan, Jianqing. Local Polynomial Modelling and Its Applications: From linear regression to nonlinear regression. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC. 1996. ISBN 978-0-412-98321-4.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]