对偶性 (最佳化)

在最优化理论中的对偶（duality）或对偶性原则（duality principle）是指最佳化问题可以用两种观点来看待的理论，两种观点分别是“原始问题”（primal problem）及“对偶问题”（dual problem）。对偶问题的解提供了原始问题（假设是最小化问题）的下限^[1]，不过一般而言，原始问题和对偶问题的最佳解不相同。两个最佳解的差距为对偶间隙。若是凸优化问题，对偶间隙也称为是卡鲁什-库恩-塔克条件。

对偶问题

一般而言“对偶问题”是指“拉格朗日对偶问题”（Lagrangian dual problem），不过也有其他的对偶问题，例如Wolfe对偶问题（英语：Wolfe dual problem）和Fenchel对偶问题（英语：Fenchel's duality theorem）。拉格朗日对偶问题是指在最小化问题上加上拉格朗日乘数，也就是在目标函数上加上对应限制条件的非负拉格朗日乘数，再求解可将原目标函数最小的原始变数值。此解会得到以拉格朗日乘数的函数表示的原始变数，称为是“对偶变数”（dual variables），因此，新的问题就是要衍生对偶变数的限制下（包括非负的限制条件），找到可以使目标函数最大化的对偶变数。

一般而言，给定二个对偶对（英语：dual pair）的分隔局部凸空间（英语：locally convex space） $\left(X,X^{*}\right)$ 及 $\left(Y,Y^{*}\right)$ ，以及函数 $f:X\to \mathbb {R} \cup \{+\infty \}$ ，可以定义原始问题为找到 ${\hat {x}}$ 使得 $f({\hat {x}})=\inf _{x\in X}f(x).\,$ 换句话说，若 ${\hat {x}}$ 存在， $f({\hat {x}})$ 是函数 $f$ 的最小极值（minimum），也可以得到函数的最大下界（infimum）。

若有限制条件，可以整合到函数 $f$ 中，方式是令 $f=f+I_{\text{constraints}}$ ，其中 $I_{\mathrm {constraints} }$ 是在 $X$ 上的适当函数，在限制条件上有最小值0，可以证明 $\inf _{x\in X}{\tilde {f}}(x)=\inf _{x\ \mathrm {constrained} }f(x)$ 。后者的条件很明显，但不一定很方便可以符合示性函数（也就是说，满足限制条件的 $x$ ， $I_{\mathrm {constraints} }(x)=0$ ，在其他情形， $I_{\mathrm {constraints} }(x)=\infty$ ）。因此可以将 ${\tilde {f}}$ 延伸为扰动函数（英语：perturbation function） $F:X\times Y\to \mathbb {R} \cup \{+\infty \}$ 使得 $F(x,0)={\tilde {f}}(x)$ ^[2]。

对偶间隙就是不等式

\sup _{y^{*}\in Y^{*}}-F^{*}(0,y^{*})\leq \inf _{x\in X}F(x,0),\,

左侧和右侧的差。

其中 $F^{*}$ 是二个变数的凸共轭，而 $\sup$ 是上确界^[2]^[3]^[4]

线性规划

线性规划问题是指损失函数和约束都是线性关系的最优化问题。原始问题中，目标函数是n个变数的线性组合，问题有m个约束，每一个都有上限，上限由n个变数的线性组合表示。其目的是要在满足限制条件的情形下，最大化目标函数的值。其解是由n个值表示的向量，可以让目标函数有最大值。

在对偶问题中，目标函数是m个值的线性组合，这些是原始问题中m个限制条件的上下限。有n个对偶限制条件，每一个的下限都是由m个对偶变数的线性组合所表示。

原始问题和对偶问题的关系

针对线性的最佳化问题，在原始问题中每一个符合限制条件的次佳点，都有一个方向（或方向的子空间），可以往目标函数增加的方向移动。若往这类的方向移动，称为除去候选解（英语：candidate solution）和一个或多个限制之间的松弛量（slack）。候选解中不可行的值即为超过一个或多个限制的值。

在对偶问题中，会将对偶向量和限制条件相乘，这些限制条件会决定原始问题中限制条件的位置。在对偶问题中变动对偶向量类似在原始问题中调整上限。要找到最小的上限。也就是说，要将对偶向量最小化，以移除限制的候选位置和实际最佳解之间的松弛量（slack）。对偶向量中的不可行值是指哪些太低的值。这会让候选解的一个或多个限制条件落在排除实际最佳解的位置。

在线性规划中探讨对偶性的方程式中，会对上述直觉有型式化的叙述。

非线性规划

非线性规划中的限制条件不一定是线性的，不过许多线性规划的原则还是适用。

为了确保可以简单的识别非线性规划中的全域最大值，问题一般会要求函数要是凸函数，而且有紧致的lower level sets。

由此可以看出卡鲁什-库恩-塔克条件的重要性。卡鲁什-库恩-塔克条件可以提供非线性规划问题中识别局部最佳值的必要条件。也有一些额外的必要条件（约束规范，constraint qualifications）可以用来判断可能有“最佳解”（是局部最佳解，但不一定是全域最佳解）的方式。

强拉格朗日原则：拉格朗日对偶

考虑以下形式的非线性规划：

{\begin{aligned}{\text{minimize }}&f_{0}(x)\\{\text{subject to }}&f_{i}(x)\leq 0,\ i\in \left\{1,\ldots ,m\right\}\\&h_{i}(x)=0,\ i\in \left\{1,\ldots ,p\right\}\end{aligned}}

其定义域 ${\mathcal {D}}\subset \mathbb {R} ^{n}$ 有非空的内部。其拉格朗日函数 $\Lambda :\mathbb {R} ^{n}\times \mathbb {R} ^{m}\times \mathbb {R} ^{p}\to \mathbb {R}$ 定义为

\Lambda (x,\lambda ,\nu )=f_{0}(x)+\sum _{i=1}^{m}\lambda _{i}f_{i}(x)+\sum _{i=1}^{p}\nu _{i}h_{i}(x).

向量 $\lambda$ 和 $\nu$ 是有关此问题的“对偶变数”（dual variables）或拉格朗日乘数向量（Lagrange multiplier vectors）。拉格朗日对偶函数（Lagrange dual function） $g:\mathbb {R} ^{m}\times \mathbb {R} ^{p}\to \mathbb {R}$ 定义如下

g(\lambda ,\nu )=\inf _{x\in {\mathcal {D}}}\Lambda (x,\lambda ,\nu )=\inf _{x\in {\mathcal {D}}}\left(f_{0}(x)+\sum _{i=1}^{m}\lambda _{i}f_{i}(x)+\sum _{i=1}^{p}\nu _{i}h_{i}(x)\right).

对偶函数g是凹函数，就算初始问题不是凸也会如此，因为是仿射函数的点状最大下界。对偶函数提供初始问题最佳值 $p^{*}$ 的下界：针对任意 $\lambda \geq 0$ 及任意 $\nu$ ，可以得到 $g(\lambda ,\nu )\leq p^{*}$ 。

若满足约束规范（例如斯莱特条件（英语：Slater's condition）），且原问题是凸，则可以得到强对偶（英语：strong duality） $d^{*}=\max _{\lambda \geq 0,\nu }g(\lambda ,\nu )=\inf f_{0}=p^{*}$ 。

凸问题

针对有不等式限制的凸最小化问题

{\begin{aligned}&{\underset {x}{\operatorname {minimize} }}&&f(x)\\&\operatorname {subject\;to} &&g_{i}(x)\leq 0,\quad i=1,\ldots ,m\end{aligned}}

拉格朗日对偶问题为

{\begin{aligned}&{\underset {u}{\operatorname {maximize} }}&&\inf _{x}\left(f(x)+\sum _{j=1}^{m}u_{j}g_{j}(x)\right)\\&\operatorname {subject\;to} &&u_{i}\geq 0,\quad i=1,\ldots ,m\end{aligned}}

其中目标函数是拉格朗日对偶函数。假设函数 $f$ and $g_{1},\ldots ,g_{m}$ 连续可微，最大下界出现在梯度等于零的位置。问题

{\begin{aligned}&{\underset {x,u}{\operatorname {maximize} }}&&f(x)+\sum _{j=1}^{m}u_{j}g_{j}(x)\\&\operatorname {subject\;to} &&\nabla f(x)+\sum _{j=1}^{m}u_{j}\,\nabla g_{j}(x)=0\\&&&u_{i}\geq 0,\quad i=1,\ldots ,m\end{aligned}}

称为Wolfe对偶问题。此问题用计算的方式处理可能会很困难，因为目标函数在联合变数 $(u,x)$ 上不是凹。而且，一般而言，等式的限制条件 $\nabla f(x)+\sum _{j=1}^{m}u_{j}\,\nabla g_{j}(x)$ 也是非线性的，因此Wolfe对偶问题一般而言会不会是凸最佳化问题。不论如何，弱对偶（英语：weak duality）会成立^[5]。

历史

依照乔治·伯纳德·丹齐格所述，在丹齐格提出了线性规划问题后，约翰·冯·诺伊曼就提出了线性规划的对偶性定理的猜想。冯·诺伊曼注意到他使用了来自其博弈论中的资讯，并且猜想两人零和的矩阵博弈和线性规划等效。严谨的证明最早是由阿尔伯特·塔克和其团队发表（丹齐格在Nering和塔克1993年著作中的前言）

脚注

^ Boyd, Stephen P.; Vandenberghe, Lieven. Convex Optimization (pdf). Cambridge University Press. 2004: 216 [October 15, 2011]. ISBN 978-0-521-83378-3. （原始内容存档 (PDF)于2021-05-09）.
^ ^2.0 ^2.1 Boţ, Radu Ioan; Wanka, Gert; Grad, Sorin-Mihai. Duality in Vector Optimization. Springer. 2009. ISBN 978-3-642-02885-4.
^ Csetnek, Ernö Robert. Overcoming the failure of the classical generalized interior-point regularity conditions in convex optimization. Applications of the duality theory to enlargements of maximal monotone operators. Logos Verlag Berlin GmbH. 2010. ISBN 978-3-8325-2503-3.
^ Zălinescu, Constantin. Convex analysis in general vector spaces. River Edge, NJ: World Scientific Publishing Co., Inc. 2002: 106–113. ISBN 981-238-067-1. MR 1921556. |issue=被忽略 (帮助)
^ Geoffrion, Arthur M. Duality in Nonlinear Programming: A Simplified Applications-Oriented Development. SIAM Review. 1971, 13 (1): 1–37. JSTOR 2028848. doi:10.1137/1013001.

参考资料

书籍

Ahuja, Ravindra K.; Magnanti, Thomas L.; Orlin, James B. Network Flows: Theory, Algorithms and Applications. Prentice Hall. 1993. ISBN 0-13-617549-X.
Bertsekas, Dimitri; Nedic, Angelia; Ozdaglar, Asuman. Convex Analysis and Optimization. Athena Scientific. 2003. ISBN 1-886529-45-0.
Bertsekas, Dimitri P. Nonlinear Programming 2nd. Athena Scientific. 1999. ISBN 1-886529-00-0.
Bertsekas, Dimitri P. Convex Optimization Theory. Athena Scientific. 2009. ISBN 978-1-886529-31-1.
Bonnans, J. Frédéric; Gilbert, J. Charles; Lemaréchal, Claude; Sagastizábal, Claudia A. Numerical optimization: Theoretical and practical aspects. Universitext Second revised ed. of translation of 1997 French. Berlin: Springer-Verlag. 2006: xiv+490 [2021-05-15]. ISBN 3-540-35445-X. MR 2265882. doi:10.1007/978-3-540-35447-5. （原始内容存档于2013-07-19）.
Cook, William J.; Cunningham, William H.; Pulleyblank, William R.; Schrijver, Alexander. Combinatorial Optimization 1st. John Wiley & Sons. November 12, 1997. ISBN 0-471-55894-X.
Dantzig, George B. Linear Programming and Extensions. Princeton, NJ: Princeton University Press. 1963.
Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude. Convex analysis and minimization algorithms, Volume I: Fundamentals. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 305. Berlin: Springer-Verlag. 1993: xviii+417. ISBN 3-540-56850-6. MR 1261420.
Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude. 14 Duality for Practitioners. Convex analysis and minimization algorithms, Volume II: Advanced theory and bundle methods. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 306. Berlin: Springer-Verlag. 1993: xviii+346. ISBN 3-540-56852-2. MR 1295240.
Lasdon, Leon S. Optimization theory for large systems. Mineola, New York: Dover Publications, Inc. 2002: xiii+523 [Reprint of the 1970 Macmillan]. ISBN 978-0-486-41999-2. MR 1888251.
Lawler, Eugene. 4.5. Combinatorial Implications of Max-Flow Min-Cut Theorem, 4.6. Linear Programming Interpretation of Max-Flow Min-Cut Theorem. Combinatorial Optimization: Networks and Matroids. Dover. 2001: 117–120. ISBN 0-486-41453-1.
Lemaréchal, Claude. Lagrangian relaxation. Jünger, Michael; Naddef, Denis (编). Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000. Lecture Notes in Computer Science (LNCS) 2241. Berlin: Springer-Verlag. 2001: 112–156. ISBN 3-540-42877-1. MR 1900016. doi:10.1007/3-540-45586-8_4.
Minoux, Michel. Mathematical programming: Theory and algorithms. Egon Balas (forward); Steven Vajda (trans) from the (1983 Paris: Dunod) French. Chichester: A Wiley-Interscience Publication. John Wiley & Sons, Ltd. 1986: xxviii+489. ISBN 0-471-90170-9. MR 0868279. (2008 Second ed., in French: Programmation mathématique : Théorie et algorithmes, Éditions Tec & Doc, Paris, 2008. xxx+711 pp. )).
Nering, Evar D.; Tucker, Albert W. Linear Programming and Related Problems. Boston, MA: Academic Press. 1993. ISBN 978-0-12-515440-6.
Papadimitriou, Christos H.; Steiglitz, Kenneth. Combinatorial Optimization : Algorithms and Complexity Unabridged. Dover. July 1998. ISBN 0-486-40258-4.
Ruszczyński, Andrzej. Nonlinear Optimization. Princeton, NJ: 普林斯顿大学出版社. 2006: xii+454. ISBN 978-0691119151. MR 2199043.

论文

Everett, Hugh, III. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Operations Research. 1963, 11 (3): 399–417 [2021-05-15]. JSTOR 168028. MR 0152360. doi:10.1287/opre.11.3.399. （原始内容存档于2011-07-24）.
Kiwiel, Krzysztof C.; Larsson, Torbjörn; Lindberg, P. O. Lagrangian relaxation via ballstep subgradient methods. Mathematics of Operations Research. August 2007, 32 (3): 669–686 [2021-05-15]. MR 2348241. doi:10.1287/moor.1070.0261. （原始内容存档于2011-07-26）.
Duality in Linear Programming （页面存档备份，存于互联网档案馆） Gary D. Knott

[Boyd-1] Boyd, Stephen P.; Vandenberghe, Lieven. Convex Optimization (pdf). Cambridge University Press. 2004: 216 [October 15, 2011]. ISBN 978-0-521-83378-3. （原始内容存档 (PDF)于2021-05-09）.

[BWG-2] 2.0 ^2.1 Boţ, Radu Ioan; Wanka, Gert; Grad, Sorin-Mihai. Duality in Vector Optimization. Springer. 2009. ISBN 978-3-642-02885-4.

[3] Csetnek, Ernö Robert. Overcoming the failure of the classical generalized interior-point regularity conditions in convex optimization. Applications of the duality theory to enlargements of maximal monotone operators. Logos Verlag Berlin GmbH. 2010. ISBN 978-3-8325-2503-3.

[Zalinescu-4] Zălinescu, Constantin. Convex analysis in general vector spaces. River Edge, NJ: World Scientific Publishing Co., Inc. 2002: 106–113. ISBN 981-238-067-1. MR 1921556. |issue=被忽略 (帮助)

[5] Geoffrion, Arthur M. Duality in Nonlinear Programming: A Simplified Applications-Oriented Development. SIAM Review. 1971, 13 (1): 1–37. JSTOR 2028848. doi:10.1137/1013001.

[1]

[2]

[3]

[4]

[5]