特徵選擇

在機器學習和統計學中，特徵選擇（英語：feature selection）也被稱為變量選擇、屬性選擇 或變量子集選擇 。它是指：為了構建模型而選擇相關特徵（即屬性、指標）子集的過程。使用特徵選擇技術有三個原因：

簡化模型，使之更易於被研究人員或用戶理解，^[1]
縮短訓練時間，
改善通用性、降低過擬合^[2]（即降低方差^[1] ）

要使用特徵選擇技術的關鍵假設是：訓練數據包含許多冗餘或無關的特徵，因而移除這些特徵並不會導致丟失信息。^[2] 冗餘或無關特徵是兩個不同的概念。如果一個特徵本身有用，但如果這個特徵與另一個有用特徵強相關，且那個特徵也出現在數據中，那麼這個特徵可能是冗餘的。^[3]

特徵選擇技術與特徵提取有所不同。特徵提取是從原有特徵的功能中創造新的特徵，而特徵選擇則只返回原有特徵中的子集。特徵選擇技術的常常用於許多特徵但樣本（即數據點）相對較少的領域。特徵選擇應用的典型用例包括：解析書面文本和微陣列數據，這些場景下特徵成千上萬，但樣本只有幾十到幾百個。

介紹[編輯]

特徵選擇算法可以被視為搜索技術和評價指標的結合。前者提供候選的新特徵子集，後者為不同的特徵子集打分。最簡單的算法是測試每個特徵子集，找到究竟哪個子集的錯誤率最低。這種算法需要窮舉搜索空間，難以算完所有的特徵集，只能涵蓋很少一部分特徵子集。選擇何種評價指標很大程度上影響了算法。而且，通過選擇不同的評價指標，可以把特徵選擇算法分為三類：包裝類、過濾類和嵌入類方法^[3]

包裝類方法使用預測模型給特徵子集打分。每個新子集都被用來訓練一個模型，然後用驗證數據集來測試。通過計算驗證數據集上的錯誤次數（即模型的錯誤率）給特徵子集評分。由於包裝類方法為每個特徵子集訓練一個新模型，所以計算量很大。不過，這類方法往往能為特定類型的模型找到性能最好的特徵集。

過濾類方法採用代理指標，而不根據特徵子集的錯誤率計分。所選的指標算得快，但仍然能估算出特徵集好不好用。常用指標包括互信息^[3]、逐點互信息^[4]、皮爾遜積矩相關係數、每種分類/特徵的組合的幀間/幀內類距離或顯著性測試評分。^[4]^[5] 過濾類方法計算量一般比包裝類小，但這類方法找到的特徵子集不能為特定類型的預測模型調校。由於缺少調校，過濾類方法所選取的特徵集會比包裝類選取的特徵集更為通用，往往會導致比包裝類的預測性能更為低下。不過，由於特徵集不包含對預測模型的假設，更有利於暴露特徵之間的關係。許多過濾類方法提供特徵排名，而非顯式提供特徵子集。要從特徵列表的哪個點切掉特徵，得靠交叉驗證來決定。過濾類方法也常常用於包裝方法的預處理步驟，以便在問題太複雜時依然可以用包裝方法。

嵌入類方法包括了所有構建模型過程中用到的特徵選擇技術。這類方法的典範是構建線性模型的LASSO方法。該方法給回歸係數加入了L1懲罰，導致其中的許多參數趨於零。任何回歸係數不為零的特徵都會被LASSO算法「選中」。LASSO的改良算法有Bolasso^[6]和FeaLect^[7]。Bolasso改進了樣本的初始過程。FeaLect根據回歸係數組合分析給所有特徵打分。另外一個流行的做法是遞歸特徵消除（Recursive Feature Elimination）算法，通常用於支持向量機，通過反覆構建同一個模型移除低權重的特徵。這些方法的計算複雜度往往在過濾類和包裝類之間。

傳統的統計學中，特徵選擇的最普遍的形式是逐步回歸，這是一個包裝類技術。它屬於貪心算法，每一輪添加該輪最優的特徵或者刪除最差的特徵。主要的調控因素是決定何時停止算法。在機器學習領域，這個時間點通常通過交叉驗證找出。在統計學中，某些條件已經優化。因而會導致嵌套引發問題。此外，還有更健壯的方法，如分支和約束和分段線性網絡。

參見[編輯]

參考文獻[編輯]

^ ^1.0 ^1.1 Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani. An Introduction to Statistical Learning. Springer. 2013: 204 [2016-06-06]. （原始內容存檔於2019-06-23）.
^ ^2.0 ^2.1 Bermingham, Mairead L.; Pong-Wong, Ricardo; Spiliopoulou, Athina; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Agakov, Felix; Navarro, Pau; Haley, Chris S. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 2015, 5 [2016-06-06]. （原始內容存檔於2015-05-24）.
^ ^3.0 ^3.1 ^3.2 Guyon, Isabelle; Elisseeff, André. An Introduction to Variable and Feature Selection. JMLR. 2003, 3 [2016-06-06]. （原始內容存檔於2019-11-19）.
^ ^4.0 ^4.1 Yang, Yiming; Pedersen, Jan O. A comparative study on feature selection in text categorization. ICML. 1997.
^ Forman, George. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research. 2003, 3: 1289–1305.
^ Bach, Francis R. Bolasso: model consistent lasso estimation through the bootstrap. Proceedings of the 25th international conference on Machine learning. 2008: 33–40. doi:10.1145/1390156.1390161.
^ Zare, Habil. Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis. BMC Genomics. 2013, 14: S14 [2016-06-06]. doi:10.1186/1471-2164-14-S1-S14. （原始內容存檔於2015-11-21）.

擴展閱讀[編輯]

Feature Selection for Classification: A Review （頁面存檔備份，存於網際網路檔案館） (Survey,2014)
Feature Selection for Clustering: A Review （頁面存檔備份，存於網際網路檔案館） (Survey,2013)
Tutorial Outlining Feature Selection Algorithms, Arizona State University
JMLR Special Issue on Variable and Feature Selection （頁面存檔備份，存於網際網路檔案館）
Feature Selection for Knowledge Discovery and Data Mining （頁面存檔備份，存於網際網路檔案館） (Book)
An Introduction to Variable and Feature Selection （頁面存檔備份，存於網際網路檔案館） (Survey)
Toward integrating feature selection algorithms for classification and clustering (Survey)
feature subset selection and subset size optimization.pdf Efficient Feature Subset Selection and Subset Size Optimization^{[永久失效連結]} (Survey, 2010)
Searching for Interacting Features （頁面存檔備份，存於網際網路檔案館）
Feature Subset Selection Bias for Classification Learning
Y. Sun, S. Todorovic, S. Goodison, Local Learning Based Feature Selection for High-dimensional Data Analysis （頁面存檔備份，存於網際網路檔案館）, IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 32, no. 9, pp. {0}8.{/0} {1}{/1}

外部連結[編輯]

A comprehensive package for Mutual Information based feature selection in Matlab （頁面存檔備份，存於網際網路檔案館）
Infinite Feature Selection (Source Code) in Matlab （頁面存檔備份，存於網際網路檔案館）
Feature Selection Package, Arizona State University (Matlab Code)
NIPS challenge 2003 （頁面存檔備份，存於網際網路檔案館） (see also NIPS)
Naive Bayes implementation with feature selection in Visual Basic (includes executable and source code)
Minimum-redundancy-maximum-relevance (mRMR) feature selection program
FEAST （頁面存檔備份，存於網際網路檔案館） (Open source Feature Selection algorithms in C and MATLAB)

[islr-1] 1.0 ^1.1 Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani. An Introduction to Statistical Learning. Springer. 2013: 204 [2016-06-06]. （原始內容存檔於2019-06-23）.

[Bermingham-prolog-2] 2.0 ^2.1 Bermingham, Mairead L.; Pong-Wong, Ricardo; Spiliopoulou, Athina; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Agakov, Felix; Navarro, Pau; Haley, Chris S. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 2015, 5 [2016-06-06]. （原始內容存檔於2015-05-24）.

[guyon-intro-3] 3.0 ^3.1 ^3.2 Guyon, Isabelle; Elisseeff, André. An Introduction to Variable and Feature Selection. JMLR. 2003, 3 [2016-06-06]. （原始內容存檔於2019-11-19）.

[textcat-4] 4.0 ^4.1 Yang, Yiming; Pedersen, Jan O. A comparative study on feature selection in text categorization. ICML. 1997.

[5] Forman, George. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research. 2003, 3: 1289–1305.

[Bolasso-6] Bach, Francis R. Bolasso: model consistent lasso estimation through the bootstrap. Proceedings of the 25th international conference on Machine learning. 2008: 33–40. doi:10.1145/1390156.1390161.

[FeaLect-7] Zare, Habil. Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis. BMC Genomics. 2013, 14: S14 [2016-06-06]. doi:10.1186/1471-2164-14-S1-S14. （原始內容存檔於2015-11-21）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]