對偶性 (最佳化)

在最优化理論中的對偶（duality）或對偶性原則（duality principle）是指最佳化問題可以用兩種觀點來看待的理論，兩種觀點分別是「原始問題」（primal problem）及「對偶問題」（dual problem）。對偶問題的解提供了原始問題（假設是最小化問題）的下限^[1]，不過一般而言，原始問題和對偶問題的最佳解不相同。兩個最佳解的差距為對偶間隙。若是凸優化問題，對偶間隙也稱為是卡鲁什-库恩-塔克条件。

對偶問題

一般而言「對偶問題」是指「拉格朗日對偶問題」（Lagrangian dual problem），不過也有其他的對偶問題，例如Wolfe對偶問題（英语：Wolfe dual problem）和Fenchel對偶問題（英语：Fenchel's duality theorem）。拉格朗日對偶問題是指在最小化問題上加上拉格朗日乘数，也就是在目標函數上加上對應限制條件的非負拉格朗日乘数，再求解可將原目標函數最小的原始變數值。此解會得到以拉格朗日乘数的函數表示的原始變數，稱為是「對偶變數」（dual variables），因此，新的問題就是要衍生對偶變數的限制下（包括非負的限制條件），找到可以使目標函數最大化的對偶變數。

一般而言，給定二個對偶對（英语：dual pair）的分隔局部凸空間（英语：locally convex space） $\left(X,X^{*}\right)$ 及 $\left(Y,Y^{*}\right)$ ，以及函數 $f:X\to \mathbb {R} \cup \{+\infty \}$ ，可以定義原始問題為找到 ${\hat {x}}$ 使得 $f({\hat {x}})=\inf _{x\in X}f(x).\,$ 換句話說，若 ${\hat {x}}$ 存在， $f({\hat {x}})$ 是函數 $f$ 的最小极值（minimum），也可以得到函數的最大下界（infimum）。

若有限制條件，可以整合到函數 $f$ 中，方式是令 $f=f+I_{\text{constraints}}$ ，其中 $I_{\mathrm {constraints} }$ 是在 $X$ 上的適當函數，在限制條件上有最小值0，可以證明 $\inf _{x\in X}{\tilde {f}}(x)=\inf _{x\ \mathrm {constrained} }f(x)$ 。後者的條件很明顯，但不一定很方便可以符合示性函数（也就是說，滿足限制條件的 $x$ ， $I_{\mathrm {constraints} }(x)=0$ ，在其他情形， $I_{\mathrm {constraints} }(x)=\infty$ ）。因此可以將 ${\tilde {f}}$ 延伸為擾動函數（英语：perturbation function） $F:X\times Y\to \mathbb {R} \cup \{+\infty \}$ 使得 $F(x,0)={\tilde {f}}(x)$ ^[2]。

對偶間隙就是不等式

\sup _{y^{*}\in Y^{*}}-F^{*}(0,y^{*})\leq \inf _{x\in X}F(x,0),\,

左側和右側的差。

其中 $F^{*}$ 是二個變數的凸共轭，而 $\sup$ 是上确界^[2]^[3]^[4]

線性規劃

线性规划問題是指损失函数和約束都是線性關係的最优化問題。原始問題中，目標函數是n個變數的線性組合，問題有m個約束，每一個都有上限，上限由n個變數的線性組合表示。其目的是要在滿足限制條件的情形下，最大化目標函數的值。其解是由n個值表示的向量，可以讓目標函數有最大值。

在對偶問題中，目標函數是m個值的線性組合，這些是原始問題中m個限制條件的上下限。有n個對偶限制條件，每一個的下限都是由m個對偶變數的線性組合所表示。

原始問題和對偶問題的關係

針對線性的最佳化問題，在原始問題中每一個符合限制條件的次佳點，都有一個方向（或方向的子空间），可以往目標函數增加的方向移動。若往這類的方向移動，稱為除去候選解（英语：candidate solution）和一個或多個限制之間的鬆弛量（slack）。候選解中不可行的值即為超過一個或多個限制的值。

在對偶問題中，會將對偶向量和限制條件相乘，這些限制條件會決定原始問題中限制條件的位置。在對偶問題中變動對偶向量類似在原始問題中調整上限。要找到最小的上限。也就是說，要將對偶向量最小化，以移除限制的候選位置和實際最佳解之間的鬆弛量（slack）。對偶向量中的不可行值是指哪些太低的值。這會讓候選解的一個或多個限制條件落在排除實際最佳解的位置。

在线性规划中探討對偶性的方程式中，會對上述直覺有型式化的敘述。

非線性規劃

非线性规划中的限制條件不一定是線性的，不過許多线性规划的原則還是適用。

為了確保可以簡單的識別非线性规划中的全域最大值，問題一般會要求函數要是凸函數，而且有緊緻的lower level sets。

由此可以看出卡鲁什-库恩-塔克条件的重要性。卡鲁什-库恩-塔克条件可以提供非線性規劃問題中識別局部最佳值的必要條件。也有一些額外的必要條件（約束規範，constraint qualifications）可以用來判斷可能有「最佳解」（是局部最佳解，但不一定是全域最佳解）的方式。

強拉格朗日原則：拉格朗日對偶

考慮以下形式的非線性規劃：

{\begin{aligned}{\text{minimize }}&f_{0}(x)\\{\text{subject to }}&f_{i}(x)\leq 0,\ i\in \left\{1,\ldots ,m\right\}\\&h_{i}(x)=0,\ i\in \left\{1,\ldots ,p\right\}\end{aligned}}

其定義域 ${\mathcal {D}}\subset \mathbb {R} ^{n}$ 有非空的內部。其拉格朗日函數 $\Lambda :\mathbb {R} ^{n}\times \mathbb {R} ^{m}\times \mathbb {R} ^{p}\to \mathbb {R}$ 定義為

\Lambda (x,\lambda ,\nu )=f_{0}(x)+\sum _{i=1}^{m}\lambda _{i}f_{i}(x)+\sum _{i=1}^{p}\nu _{i}h_{i}(x).

向量 $\lambda$ 和 $\nu$ 是有關此問題的「對偶變數」（dual variables）或拉格朗日乘數向量（Lagrange multiplier vectors）。拉格朗日對偶函數（Lagrange dual function） $g:\mathbb {R} ^{m}\times \mathbb {R} ^{p}\to \mathbb {R}$ 定義如下

g(\lambda ,\nu )=\inf _{x\in {\mathcal {D}}}\Lambda (x,\lambda ,\nu )=\inf _{x\in {\mathcal {D}}}\left(f_{0}(x)+\sum _{i=1}^{m}\lambda _{i}f_{i}(x)+\sum _{i=1}^{p}\nu _{i}h_{i}(x)\right).

對偶函數g是凹函數，就算初始問題不是凸也會如此，因為是仿射函數的點狀最大下界。對偶函數提供初始問題最佳值 $p^{*}$ 的下界：針對任意 $\lambda \geq 0$ 及任意 $\nu$ ，可以得到 $g(\lambda ,\nu )\leq p^{*}$ 。

若滿足約束規範（例如斯萊特條件（英语：Slater's condition）），且原問題是凸，則可以得到強對偶（英语：strong duality） $d^{*}=\max _{\lambda \geq 0,\nu }g(\lambda ,\nu )=\inf f_{0}=p^{*}$ 。

凸問題

針對有不等式限制的凸最小化問題

{\begin{aligned}&{\underset {x}{\operatorname {minimize} }}&&f(x)\\&\operatorname {subject\;to} &&g_{i}(x)\leq 0,\quad i=1,\ldots ,m\end{aligned}}

拉格朗日對偶問題為

{\begin{aligned}&{\underset {u}{\operatorname {maximize} }}&&\inf _{x}\left(f(x)+\sum _{j=1}^{m}u_{j}g_{j}(x)\right)\\&\operatorname {subject\;to} &&u_{i}\geq 0,\quad i=1,\ldots ,m\end{aligned}}

其中目標函數是拉格朗日對偶函數。假設函數 $f$ and $g_{1},\ldots ,g_{m}$ 連續可微，最大下界出現在梯度等於零的位置。問題

{\begin{aligned}&{\underset {x,u}{\operatorname {maximize} }}&&f(x)+\sum _{j=1}^{m}u_{j}g_{j}(x)\\&\operatorname {subject\;to} &&\nabla f(x)+\sum _{j=1}^{m}u_{j}\,\nabla g_{j}(x)=0\\&&&u_{i}\geq 0,\quad i=1,\ldots ,m\end{aligned}}

稱為Wolfe對偶問題。此問題用計算的方式處理可能會很困難，因為目標函數在聯合變數 $(u,x)$ 上不是凹。而且，一般而言，等式的限制條件 $\nabla f(x)+\sum _{j=1}^{m}u_{j}\,\nabla g_{j}(x)$ 也是非線性的，因此Wolfe對偶問題一般而言會不會是凸最佳化問題。不論如何，弱對偶（英语：weak duality）會成立^[5]。

歷史

依照喬治·伯納德·丹齊格所述，在丹齊格提出了線性規劃問題後，约翰·冯·诺伊曼就提出了線性規劃的對偶性定理的猜想。冯·诺伊曼注意到他使用了來自其博弈论中的資訊，並且猜想兩人零和的矩陣博弈和線性規劃等效。嚴謹的證明最早是由阿尔伯特·塔克和其團隊發表（丹齊格在Nering和塔克1993年著作中的前言）

腳註

^ Boyd, Stephen P.; Vandenberghe, Lieven. Convex Optimization (pdf). Cambridge University Press. 2004: 216 [October 15, 2011]. ISBN 978-0-521-83378-3. （原始内容存档 (PDF)于2021-05-09）.
^ ^2.0 ^2.1 Boţ, Radu Ioan; Wanka, Gert; Grad, Sorin-Mihai. Duality in Vector Optimization. Springer. 2009. ISBN 978-3-642-02885-4.
^ Csetnek, Ernö Robert. Overcoming the failure of the classical generalized interior-point regularity conditions in convex optimization. Applications of the duality theory to enlargements of maximal monotone operators. Logos Verlag Berlin GmbH. 2010. ISBN 978-3-8325-2503-3.
^ Zălinescu, Constantin. Convex analysis in general vector spaces. River Edge, NJ: World Scientific Publishing Co., Inc. 2002: 106–113. ISBN 981-238-067-1. MR 1921556. |issue=被忽略 (帮助)
^ Geoffrion, Arthur M. Duality in Nonlinear Programming: A Simplified Applications-Oriented Development. SIAM Review. 1971, 13 (1): 1–37. JSTOR 2028848. doi:10.1137/1013001.

參考資料

書籍

Ahuja, Ravindra K.; Magnanti, Thomas L.; Orlin, James B. Network Flows: Theory, Algorithms and Applications. Prentice Hall. 1993. ISBN 0-13-617549-X.
Bertsekas, Dimitri; Nedic, Angelia; Ozdaglar, Asuman. Convex Analysis and Optimization. Athena Scientific. 2003. ISBN 1-886529-45-0.
Bertsekas, Dimitri P. Nonlinear Programming 2nd. Athena Scientific. 1999. ISBN 1-886529-00-0.
Bertsekas, Dimitri P. Convex Optimization Theory. Athena Scientific. 2009. ISBN 978-1-886529-31-1.
Bonnans, J. Frédéric; Gilbert, J. Charles; Lemaréchal, Claude; Sagastizábal, Claudia A. Numerical optimization: Theoretical and practical aspects. Universitext Second revised ed. of translation of 1997 French. Berlin: Springer-Verlag. 2006: xiv+490 [2021-05-15]. ISBN 3-540-35445-X. MR 2265882. doi:10.1007/978-3-540-35447-5. （原始内容存档于2013-07-19）.
Cook, William J.; Cunningham, William H.; Pulleyblank, William R.; Schrijver, Alexander. Combinatorial Optimization 1st. John Wiley & Sons. November 12, 1997. ISBN 0-471-55894-X.
Dantzig, George B. Linear Programming and Extensions. Princeton, NJ: Princeton University Press. 1963.
Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude. Convex analysis and minimization algorithms, Volume I: Fundamentals. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 305. Berlin: Springer-Verlag. 1993: xviii+417. ISBN 3-540-56850-6. MR 1261420.
Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude. 14 Duality for Practitioners. Convex analysis and minimization algorithms, Volume II: Advanced theory and bundle methods. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 306. Berlin: Springer-Verlag. 1993: xviii+346. ISBN 3-540-56852-2. MR 1295240.
Lasdon, Leon S. Optimization theory for large systems. Mineola, New York: Dover Publications, Inc. 2002: xiii+523 [Reprint of the 1970 Macmillan]. ISBN 978-0-486-41999-2. MR 1888251.
Lawler, Eugene. 4.5. Combinatorial Implications of Max-Flow Min-Cut Theorem, 4.6. Linear Programming Interpretation of Max-Flow Min-Cut Theorem. Combinatorial Optimization: Networks and Matroids. Dover. 2001: 117–120. ISBN 0-486-41453-1.
Lemaréchal, Claude. Lagrangian relaxation. Jünger, Michael; Naddef, Denis (编). Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000. Lecture Notes in Computer Science (LNCS) 2241. Berlin: Springer-Verlag. 2001: 112–156. ISBN 3-540-42877-1. MR 1900016. doi:10.1007/3-540-45586-8_4.
Minoux, Michel. Mathematical programming: Theory and algorithms. Egon Balas (forward); Steven Vajda (trans) from the (1983 Paris: Dunod) French. Chichester: A Wiley-Interscience Publication. John Wiley & Sons, Ltd. 1986: xxviii+489. ISBN 0-471-90170-9. MR 0868279. (2008 Second ed., in French: Programmation mathématique : Théorie et algorithmes, Éditions Tec & Doc, Paris, 2008. xxx+711 pp. )).
Nering, Evar D.; Tucker, Albert W. Linear Programming and Related Problems. Boston, MA: Academic Press. 1993. ISBN 978-0-12-515440-6.
Papadimitriou, Christos H.; Steiglitz, Kenneth. Combinatorial Optimization : Algorithms and Complexity Unabridged. Dover. July 1998. ISBN 0-486-40258-4.
Ruszczyński, Andrzej. Nonlinear Optimization. Princeton, NJ: 普林斯頓大學出版社. 2006: xii+454. ISBN 978-0691119151. MR 2199043.

論文

Everett, Hugh, III. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Operations Research. 1963, 11 (3): 399–417 [2021-05-15]. JSTOR 168028. MR 0152360. doi:10.1287/opre.11.3.399. （原始内容存档于2011-07-24）.
Kiwiel, Krzysztof C.; Larsson, Torbjörn; Lindberg, P. O. Lagrangian relaxation via ballstep subgradient methods. Mathematics of Operations Research. August 2007, 32 (3): 669–686 [2021-05-15]. MR 2348241. doi:10.1287/moor.1070.0261. （原始内容存档于2011-07-26）.
Duality in Linear Programming （页面存档备份，存于互联网档案馆） Gary D. Knott

[Boyd-1] Boyd, Stephen P.; Vandenberghe, Lieven. Convex Optimization (pdf). Cambridge University Press. 2004: 216 [October 15, 2011]. ISBN 978-0-521-83378-3. （原始内容存档 (PDF)于2021-05-09）.

[BWG-2] 2.0 ^2.1 Boţ, Radu Ioan; Wanka, Gert; Grad, Sorin-Mihai. Duality in Vector Optimization. Springer. 2009. ISBN 978-3-642-02885-4.

[3] Csetnek, Ernö Robert. Overcoming the failure of the classical generalized interior-point regularity conditions in convex optimization. Applications of the duality theory to enlargements of maximal monotone operators. Logos Verlag Berlin GmbH. 2010. ISBN 978-3-8325-2503-3.

[Zalinescu-4] Zălinescu, Constantin. Convex analysis in general vector spaces. River Edge, NJ: World Scientific Publishing Co., Inc. 2002: 106–113. ISBN 981-238-067-1. MR 1921556. |issue=被忽略 (帮助)

[5] Geoffrion, Arthur M. Duality in Nonlinear Programming: A Simplified Applications-Oriented Development. SIAM Review. 1971, 13 (1): 1–37. JSTOR 2028848. doi:10.1137/1013001.

[1]

[2]

[3]

[4]

[5]