# 共轭梯度法的推导

$\boldsymbol{Ax}=\boldsymbol{b}$

## 从 Arnoldi/Lanczos 迭代推导

### 一般 Arnoldi 方法

Arnoldi 迭代从一个向量 $\boldsymbol{r}_0$ 开始，通过定义 $\boldsymbol{v}_i=\boldsymbol{w}_i/\lVert\boldsymbol{w}_i\rVert_2$，其中

$\boldsymbol{w}_i=\begin{cases} \boldsymbol{r}_0 & \text{if }i=1\text{,}\\ \boldsymbol{Av}_{i-1}-\sum_{j=1}^{i-1}(\boldsymbol{v}_j^\mathrm{T}\boldsymbol{Av}_{i-1})\boldsymbol{v}_j & \text{if }i>1\text{,} \end{cases}$

$\mathcal{K}(\boldsymbol{A},\boldsymbol{r}_0)=\{\boldsymbol{r}_0,\boldsymbol{Ar}_0,\boldsymbol{A}^2\boldsymbol{r}_0,\ldots\}$

$\boldsymbol{AV}_i=\boldsymbol{V}_{i+1}\boldsymbol{\tilde{H}}_i\text{,}$

\begin{align} \boldsymbol{V}_i&=\begin{bmatrix} \boldsymbol{v}_1 & \boldsymbol{v}_2 & \cdots & \boldsymbol{v}_i \end{bmatrix}\text{,}\\ \boldsymbol{\tilde{H}}_i&=\begin{bmatrix} h_{11} & h_{12} & h_{13} & \cdots & h_{1,i}\\ h_{21} & h_{22} & h_{23} & \cdots & h_{2,i}\\ & h_{32} & h_{33} & \cdots & h_{3,i}\\ & & \ddots & \ddots & \vdots\\ & & & h_{i,i-1} & h_{i,i}\\ & & & & h_{i+1,i} \end{bmatrix}=\begin{bmatrix} \boldsymbol{H}_i\\ h_{i+1,i}\boldsymbol{e}_i^\mathrm{T} \end{bmatrix}\text{,}\\ h_{ji}&=\begin{cases} \boldsymbol{v}_j^\mathrm{T}\boldsymbol{Av}_i & \text{if }j\leq i\text{,}\\ \lVert\boldsymbol{w}_{i+1}\rVert_2 & \text{if }j=i+1\text{,}\\ 0 & \text{if }j>i+1\text{.} \end{cases} \end{align}

### 直接 Lanzcos 方法

$\boldsymbol{H}_i=\begin{bmatrix} a_1 & b_2\\ b_2 & a_2 & b_3\\ & \ddots & \ddots & \ddots\\ & & b_{i-1} & a_{i-1} & b_i\\ & & & b_i & a_i \end{bmatrix}\text{.}$

$\boldsymbol{H}_i=\boldsymbol{L}_i\boldsymbol{U}_i=\begin{bmatrix} 1\\ c_2 & 1\\ & \ddots & \ddots\\ & & c_{i-1} & 1\\ & & & c_i & 1 \end{bmatrix}\begin{bmatrix} d_1 & b_2\\ & d_2 & b_3\\ & & \ddots & \ddots\\ & & & d_{i-1} & b_i\\ & & & & d_i \end{bmatrix}\text{,}$

\begin{align} c_i&=b_i/d_{i-1}\text{,}\\ d_i&=\begin{cases} a_1 & \text{if }i=1\text{,}\\ a_i-c_ib_i & \text{if }i>1\text{.} \end{cases} \end{align}

\begin{align} \boldsymbol{x}_i&=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{H}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\\ &=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{U}_i^{-1}\boldsymbol{L}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\\ &=\boldsymbol{x}_0+\boldsymbol{P}_i\boldsymbol{z}_i\text{,} \end{align}

\begin{align} \boldsymbol{P}_i&=\boldsymbol{V}_{i}\boldsymbol{U}_i^{-1}\text{,}\\ \boldsymbol{z}_i&=\boldsymbol{L}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\text{.} \end{align}

\begin{align} \boldsymbol{P}_i&=\begin{bmatrix} \boldsymbol{P}_{i-1} & \boldsymbol{p}_i \end{bmatrix}\text{,}\\ \boldsymbol{z}_i&=\begin{bmatrix} \boldsymbol{z}_{i-1}\\ \zeta_i \end{bmatrix}\text{.} \end{align}

\begin{align} \boldsymbol{p}_i&=\frac{1}{d_i}(\boldsymbol{v}_i-b_i\boldsymbol{p}_{i-1})\text{,}\\ \alpha_i&=-c_i\zeta_{i-1}\text{.} \end{align}

\begin{align} \boldsymbol{x}_i&=\boldsymbol{x}_0+\boldsymbol{P}_i\boldsymbol{z}_i\\ &=\boldsymbol{x}_0+\boldsymbol{P}_{i-1}\boldsymbol{z}_{i-1}+\zeta_i\boldsymbol{p}_i\\ &=\boldsymbol{x}_{i-1}+\zeta_i\boldsymbol{p}_i\text{.} \end{align}

### 从正交性和共轭性导出共轭梯度法

\begin{align} \boldsymbol{x}_i&=\boldsymbol{x}_{i-1}+\alpha_{i-1}\boldsymbol{p}_{i-1}\text{,}\\ \boldsymbol{r}_i&=\boldsymbol{r}_{i-1}-\alpha_{i-1}\boldsymbol{Ap}_{i-1}\text{,}\\ \boldsymbol{p}_i&=\boldsymbol{r}_i+\beta_{i-1}\boldsymbol{p}_{i-1}\text{.} \end{align}

\begin{align} \boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_j&=0\text{,}\\ \boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_j&=0\text{.} \end{align}

\begin{align} \boldsymbol{r}_i&=\boldsymbol{b}-\boldsymbol{Ax}_i\\ &=\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{y}_i)\\ &=\boldsymbol{r}_0-\boldsymbol{AV}_i\boldsymbol{y}_i\\ &=\boldsymbol{r}_0-\boldsymbol{V}_{i+1}\boldsymbol{\tilde{H}}_i\boldsymbol{y}_i\\ &=\boldsymbol{r}_0-\boldsymbol{V}_i\boldsymbol{H}_i\boldsymbol{y}_i-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\\ &=\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{v}_1-\boldsymbol{V}_i(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\\ &=-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\text{.}\end{align}

\begin{align} \boldsymbol{P}_i^\mathrm{T}\boldsymbol{AP}_i&=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{V}_i^\mathrm{T}\boldsymbol{AV}_i\boldsymbol{U}_i^{-1}\\ &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{H}_i\boldsymbol{U}_i^{-1}\\ &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{L}_i\boldsymbol{U}_i\boldsymbol{U}_i^{-1}\\ &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{L}_i \end{align}

\begin{align} \alpha_i&=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{Ap}_i}\\ &=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{(\boldsymbol{p}_i-\beta_{i-1}\boldsymbol{p}_{i-1})^\mathrm{T}\boldsymbol{Ap}_i}\\ &=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\text{.} \end{align}

\begin{align} \beta_i&=-\frac{\boldsymbol{r}_{i+1}^\mathrm{T}\boldsymbol{Ap}_i}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\\ &=-\frac{\boldsymbol{r}_{i+1}^\mathrm{T}(\boldsymbol{r}_i-\boldsymbol{r}_{i+1})}{\alpha_i\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\\ &=\frac{\boldsymbol{r}_{i+1}^\mathrm{T}\boldsymbol{r}_{i+1}}{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}\text{.} \end{align}

## 参考文献

1. Hestenes, M. R.; Stiefel, E. Methods of conjugate gradients for solving linear systems (PDF). Journal of Research of the National Bureau of Standards. 1952.12, 49 (6).
2. Saad, Y. Chapter 6: Krylov Subspace Methods, Part I//Iterative methods for sparse linear systems 2nd. SIAM. 2003. ISBN 978-0898715347.