跳转至

线性规划的对偶

阅读信息

1119 词  7 分钟  本页总访问量 加载中...

LP:

\[ \begin{aligned} \min_{\boldsymbol{x}}\ \ & f(\boldsymbol{x}) = \boldsymbol{c}^T \boldsymbol{x} \\ & \boldsymbol{Ax} = \boldsymbol{b} \\ & \boldsymbol{Gx} \geq \boldsymbol{h} \end{aligned} \]

By knowing the lower bound of an LP, we can tell how good our solution \(f(\boldsymbol{x})\) is

Assume the solution of an LP \(f^*\) has a lower bound \(f_{\mathrm{LB}}\), then for all feasible \(\boldsymbol{x}\), \(f^*\) is also the lower bound of \(f(\boldsymbol{x})\)

How to find it?

Noting that the constraints are in the form of lower bound, we can find \(\boldsymbol{\lambda}, \boldsymbol{\mu}\) and \(\boldsymbol{\mu} \geq \boldsymbol{0}\),

\[ \boldsymbol{\lambda}^T \boldsymbol{Ax} + \boldsymbol{\mu}^T\boldsymbol{Gx} \geq \boldsymbol{\lambda}^T\boldsymbol{b} + \boldsymbol{\mu}^T \boldsymbol{h} \]

Dual LP

If \(\boldsymbol{A}^T \boldsymbol{\lambda} + \boldsymbol{G}^T\boldsymbol{\mu} = \boldsymbol{c}\), then we can get the lower bound of \(f^*\): \(\boldsymbol{\lambda}^T\boldsymbol{b} + \boldsymbol{\mu}^T \boldsymbol{h}\)

To maximize the lower bound, we only need to maximize

\[ \begin{aligned} \max_{\boldsymbol{\lambda}, \boldsymbol{\mu}}\ \ & \psi (\boldsymbol{\lambda}, \boldsymbol{\mu}) = \boldsymbol{b}^T\boldsymbol{\lambda} + \boldsymbol{\mu}^T\boldsymbol{h}\\ & \boldsymbol{A}^T \boldsymbol{\lambda} + \boldsymbol{G}^T\boldsymbol{\mu} = \boldsymbol{c} \\ &\boldsymbol{\mu} \geq \boldsymbol{0} \end{aligned} \]

This is called the Dual LP of the former LP

Relations

alt text

The lower Dual:

\[ \boldsymbol{x}^T(\boldsymbol{A}^T \boldsymbol{\lambda} + \boldsymbol{G}^T \boldsymbol{\mu}) + \boldsymbol{y}^T\boldsymbol{\mu} \geq \boldsymbol{c}^T \boldsymbol{x} \]

and

\[ \boldsymbol{x}^T\boldsymbol{A}^T = \boldsymbol{b}^T, \quad \boldsymbol{G}^T + \boldsymbol{y}^T = \boldsymbol{h}^T \]

Common forms of dual LP

alt text

Weak and Strong Duality

Consider the LP form

\[ \begin{alignedat}{2} \min_{\boldsymbol{x}}\quad & f(\boldsymbol{x}) = \boldsymbol{c}^T \boldsymbol{x} \qquad\qquad & \max_{\boldsymbol{y}}\quad & \psi(\boldsymbol{y}) = \boldsymbol{b}^T \boldsymbol{y} \\ \text{s.t.}\quad & \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b} & \text{s.t.}\quad & \boldsymbol{A}^T \boldsymbol{y} = \boldsymbol{c} \\ & \boldsymbol{G}\boldsymbol{x} \ge \boldsymbol{h} & & \boldsymbol{y} \ge \boldsymbol{0} \end{alignedat} \]

Weak duality

We have

\[ \boldsymbol{c}^T \boldsymbol{x} \geq f^* \geq \psi^* \geq \boldsymbol{b}^T \boldsymbol{y} \]

Proof

Noting

\[ \boldsymbol{c}^T\boldsymbol{x} = \left(\boldsymbol{A}^T \boldsymbol{y}\right)^T\boldsymbol{x} = \boldsymbol{y}^T \boldsymbol{Ax} \geq \boldsymbol{y}^T \boldsymbol{b} \]

It holds for all LP problems. When will the "=" hold?

Strong duality

We say strong duality holds if \(f^* = \psi^*\)

If \(f^*\) or \(\psi^*\) is finite, then strong duality holds, and there exist optimal solutions \(\boldsymbol{x}^*, \boldsymbol{y}^*\) such that \(\boldsymbol{c}^T\boldsymbol{x}^* = f^* = \psi^* = \boldsymbol{b}^T\boldsymbol{y}^*\)

A weaker form: If LP has a optimal solution, then the dual also has one and strong duality holds.

Proof

By the KKT conditions, there exists a Lagrange multiplier \(\boldsymbol{\mu}^*\) s.t.
stationarity \(\boldsymbol{c} - \boldsymbol{A}^T\boldsymbol{\mu} = \boldsymbol{0}\)
nonnegativity \(\boldsymbol{\mu}^* \geq \boldsymbol{0}\)
complementary slackness \(\left(\boldsymbol{\mu}^*\right)^T(\boldsymbol{Ax} - \boldsymbol{b}) = 0\)

So \(\boldsymbol{c}^T \boldsymbol{x}^* = (\boldsymbol{A\mu}^*)^T\boldsymbol{x}^* = \boldsymbol{b}^T\boldsymbol{\mu}\)

Example: Wasserstein distance

The (first) Wasserstein distance \(W_1(a, b)\) between the two distributions \(a, b\) is the optimal value of the optimal transport problem:

\[ \begin{aligned} \min_{(x_{ij})}\ \ & \sum_{i = 1}^n \sum_{j = 1}^m c_{ij}x_{ij} \\ & \sum_{i = 1}^n x_{ij} = b_j, \forall j \\ & \sum_{j = 1}^m x_{ij} = a_i, \forall i \\ & x_{ij} \geq 0 \end{aligned} \]

i.e.

\[ W_1(a, b) = \inf_{\boldsymbol{X}_0 \in \Pi(a, b)}\sum_{i, j}d(\boldsymbol{x}_i, \boldsymbol{y}_j)x_{ij} = \inf_{\boldsymbol{X} \in \Pi(a, b)}\mathbb{E}_{(\boldsymbol{X}, \boldsymbol{Y}) \sim \boldsymbol{X}_0}(d(\boldsymbol{X}, \boldsymbol{Y})) \]

To be completed...

Dual LP via Lagrangian

The lagrangian of general LP is

\[ \begin{aligned} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) &= \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{\lambda}^T(\boldsymbol{Ax} - \boldsymbol{b}) - \boldsymbol{\mu}^T(\boldsymbol{Gx} - \boldsymbol{h}) \\ &= \left(\boldsymbol{c}^T - \boldsymbol{\lambda}^T\boldsymbol{A} - \boldsymbol{\mu}^T \boldsymbol{G}\right)\boldsymbol{x} + \boldsymbol{\lambda}^T\boldsymbol{b} + \boldsymbol{\mu}^T\boldsymbol{h} \end{aligned} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{\lambda}^T(\boldsymbol{Ax} - \boldsymbol{b}) - \boldsymbol{\mu}^T(\boldsymbol{Gx} - \boldsymbol{h}) \]

If \(\boldsymbol{x} \in \Omega\) and \(\boldsymbol{\mu} \geq \boldsymbol{0}\), then

\[ f(\boldsymbol{x}) = \boldsymbol{c}^T\boldsymbol{x} \geq \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \]

So

\[ f^* = \inf_{\boldsymbol{x} \in \Omega} f(\boldsymbol{x}) \geq \inf_{\boldsymbol{x} \in \Omega} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \geq \inf_{\boldsymbol{x}} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \coloneqq \phi(\boldsymbol{\lambda}, \boldsymbol{\mu}) \]

Maximize the lower bound equals to

\[ \begin{aligned} \max_{\boldsymbol{\lambda}, \boldsymbol{\mu}}\ \ & \phi(\boldsymbol{\lambda}, \boldsymbol{\mu}) \\ & \boldsymbol{\mu} \geq \boldsymbol{0} \end{aligned} \]

for all \(\boldsymbol{x} \in \Omega\). Noting that

\[ \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = (\boldsymbol{c} - \boldsymbol{A}^T\boldsymbol{\lambda} - \boldsymbol{G}^T\boldsymbol{\mu})\boldsymbol{x} + \boldsymbol{b}^T \boldsymbol{\lambda} + \boldsymbol{h}^T\boldsymbol{\mu} \]

is bounded below iff \(\boldsymbol{c} - \boldsymbol{A}^T\boldsymbol{\lambda} - \boldsymbol{G}^T\boldsymbol{\mu} = \boldsymbol{0}\)! So the dual problem is equal to (note that \(\phi = \inf \mathcal{L}\))

\[ \begin{aligned} \max \ \ & \psi(\boldsymbol{\lambda}, \boldsymbol{\mu}) = \boldsymbol{b}^T \boldsymbol{\lambda} + \boldsymbol{h}^T\boldsymbol{\mu} \\ & \boldsymbol{c} - \boldsymbol{A}^T\boldsymbol{\lambda} - \boldsymbol{G}^T\boldsymbol{\mu} = \boldsymbol{0} \\ & \boldsymbol{\mu} \geq \boldsymbol{0} \end{aligned} \]

Which is the dual LP.

Lagrange Dual Function (in general cases)

The function \(\phi(\boldsymbol{\lambda}, \boldsymbol{\mu})\) is called the dual function.

The dual function is always concave, no matter the primal problem is convex or not.

Proof

Obviously \(\phi\) is the minus supremum of a affine function in \(\boldsymbol{\lambda}, \boldsymbol{\mu}\).

In weak duality cases, we call \(f^* - \phi^*\) the (optimal) duality gap

Duality gap

The difference \(f(\boldsymbol{x}) - \phi(\boldsymbol{\lambda}, \boldsymbol{\mu})\) associated with \(\boldsymbol{x}\) and \((\boldsymbol{\lambda}, \boldsymbol{\mu})\) is called the duality gap.
If the gap is less than \(\varepsilon\), we can tell that the duat solution \((\boldsymbol{\lambda}, \boldsymbol{\mu})\) serves as a proof or certificate that \(\boldsymbol{x}\) is \(\varepsilon\)-suboptimal

\[ f(\boldsymbol{x}) - f^* \leq f(\boldsymbol{x}) - \phi(\boldsymbol{\lambda}, \boldsymbol{\mu}) \leq \varepsilon \]

Slater’s Theorem

Consider the convex problem

\[ \begin{aligned} \max \ \ & f(\boldsymbol{x}) \quad \text{(CP)}\\ & g_j(\boldsymbol{x}) \leq 0 \\ & \boldsymbol{h}(\boldsymbol{x}) = \boldsymbol{Ax} - \boldsymbol{b} = \boldsymbol{0} \end{aligned} \]

Slater's condition The above problem is strictly feasible, i.e.

\[ \exist \boldsymbol{x}_0 \in D, g_j(\boldsymbol{x}_0) < 0, \boldsymbol{Ax}_0 = \boldsymbol{b} \]

Refined Slater’s condition

If some \(g_j\) are affine, \(g_j(\boldsymbol{x}_0) < 0\) can be relaxed to \(g_j(\boldsymbol{x}_0) \leq 0\)

Slater’s Theorem

Strong duality holds for (CP) under (refined) Slater’s condition.

Proof

Here \(D\) is the domain of all functions in (CP) and \(\Omega\) is the feasible set.

alt text

If \(f^* = -\infty\), then \(f^* = \phi^* = -\infty\) by weak duality.
Due to (CP) is strongly feasible, if \(f^* > -\infty\), then \(f^* < +\infty\). Assume \(\boldsymbol{A} \in \mathbb{R}^{k \times n}\) with \(\operatorname{rank}A = k\).

Let

\[ C = \{(\boldsymbol{u}, \boldsymbol{v}, t)\ |\ \exist \boldsymbol{x} \in D \text{ s.t. } \boldsymbol{g}(\boldsymbol{x}) \leq \boldsymbol{u}, \boldsymbol{f}(\boldsymbol{x}) = \boldsymbol{v}, f(\boldsymbol{x}) \leq t\} \]

Similarly we know that \(C\) is convex.

It's easy to know that \((\boldsymbol{0}, \boldsymbol{0}, f(\boldsymbol{x})) \in C\) for \(\boldsymbol{x} \in \Omega\), and since \(f^* = \inf_{\boldsymbol{x} \in \Omega}f(\boldsymbol{x})\), \((\boldsymbol{0}, \boldsymbol{0}, f^*) \in \partial C\).
So there exists a supporting hyperplane at \((\boldsymbol{0}, \boldsymbol{0}, f^*) \in \partial C\), satisfies:

\[ P: (\boldsymbol{\mu}, \boldsymbol{\lambda}, \alpha)\cdot (\boldsymbol{u}, \boldsymbol{v}, t) = \boldsymbol{\mu}^T \boldsymbol{u} + \boldsymbol{\lambda}^T\boldsymbol{v} + \alpha t \geq (\boldsymbol{\mu}, \boldsymbol{\lambda}, \alpha)\cdot (\boldsymbol{0}, \boldsymbol{0}, f^*) = \alpha f^* \]

Then, letting \(\boldsymbol{u}, t \to +\infty\) yields \(\boldsymbol{\mu} \geq \boldsymbol{0}, \alpha \geq 0\).

Noting \((\boldsymbol{g}(\boldsymbol{x}), \boldsymbol{h}(\boldsymbol{x}), f(\boldsymbol{x})) \in C\),

\[ \alpha f^* \leq \boldsymbol{\mu}^T\boldsymbol{g}(\boldsymbol{x}) + \boldsymbol{\lambda}^T\boldsymbol{h}(\boldsymbol{x}) + \alpha f(\boldsymbol{x}), \forall \boldsymbol{x} \in D \]

Then we have to show \(\alpha \neq 0\) by Slater's condition.

Now assume \(\alpha = 0\)

By the inequality above, if \(\boldsymbol{x}\) is feasible, (due to \(\alpha = 0\) and \(\boldsymbol{\mu} \geq \boldsymbol{0}\))

\[ \underbrace{\boldsymbol{\mu}^T\boldsymbol{g}(\boldsymbol{x})}_{\leq 0} + \boldsymbol{\lambda}^T\underbrace{\boldsymbol{h}(\boldsymbol{x})}_{=0} \geq 0 \]

So \(\boldsymbol{\mu}^T\boldsymbol{g}(\boldsymbol{x}) = \boldsymbol{0}\)

Let \(\boldsymbol{x}_0\) satisfies Slater's condition, i.e. \(\boldsymbol{x}_0 \in \operatorname{int}D, \boldsymbol{h}(\boldsymbol{x}_0) = \boldsymbol{0}, \boldsymbol{g}(\boldsymbol{x}_0) < \boldsymbol{0}\). Easily we know that \(\boldsymbol{\mu} = \boldsymbol{0}\).

Also we know that \(\boldsymbol{\lambda}^T\boldsymbol{h}(\boldsymbol{x}) \geq 0\) for all \(\boldsymbol{x} \in D\).
\(\boldsymbol{x}_0 \in \operatorname{int}D\) infers there exists \(\delta > 0\), for all \(\boldsymbol{z} \in B(0, \delta), \boldsymbol{x}_0 + \boldsymbol{z} \in \operatorname{int}D\), so

\[ \boldsymbol{\lambda}^T\boldsymbol{Az} = \boldsymbol{\lambda}^T\left(\boldsymbol{h}(\boldsymbol{x}_0 + \boldsymbol{z}) - \underbrace{\boldsymbol{h}(\boldsymbol{x}_0)}_{=0}\right) \geq 0, \quad \forall \boldsymbol{z} \in B(0, \delta) \]

So \(\boldsymbol{A}^T\boldsymbol{\lambda} = \boldsymbol{0}\). Due to \(\boldsymbol{A}\) has full row rank, \(\boldsymbol{\lambda} = \boldsymbol{0}\).

By now we have got \((\boldsymbol{\mu}, \boldsymbol{\lambda}, \alpha) = \boldsymbol{0}\), contradiction!
So \(\alpha \neq 0\)

Then let \(\boldsymbol{\mu}^* = \boldsymbol{\mu} / \alpha, \boldsymbol{\lambda}^* = \boldsymbol{\lambda} / \alpha\), we have

\[ f^* \leq f(\boldsymbol{x}) + (\boldsymbol{\mu}^*)^T\boldsymbol{g}(\boldsymbol{x}) + (\boldsymbol{\lambda}^*)^T \boldsymbol{h(\boldsymbol{x})} = \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*) \]

Minimizing over \(\boldsymbol{x}\) we get

\[ f^* \leq \min_{\boldsymbol{x}}\mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*) = \phi(\boldsymbol{\lambda}^*, \boldsymbol{\mu}^*) \leq \phi^* \]

And by definition \(f^* \geq \phi^*\), so \(f^* = \phi^*\), strong duality holds!

If \(\operatorname{rand}\boldsymbol{A} \leq k\), we just abandon some duplicated constraints in \(\boldsymbol{h}\) and do the same above.

Dual formulation of (soft-margin SVM)

\[ \begin{aligned} &\min_{\boldsymbol{w}, b}\frac{1}{2}\Vert\boldsymbol{w}\Vert^2 + C \sum_{i = 1}^m \xi_i \\ &\ \mathrm{s.t.}\quad y_i(\boldsymbol{w}^T \boldsymbol{x}_i + b) \geq 1 - \xi_i\\ &\qquad \ \ \boldsymbol{\xi} \geq \boldsymbol{0}, C > 0 \end{aligned} \]

Due to is constrains are all affine and are all inequality, according to Slater's condition, strong duality holds.

The lagrangian is

\[ \begin{aligned} \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\xi}, \boldsymbol{\mu}, \boldsymbol{\alpha}) &= \frac{1}{2}\Vert \boldsymbol{w}\Vert_2^2 + C \boldsymbol{1}^T\boldsymbol{\xi} + \sum_{i = 1}^n\mu_i (1 - \xi_i - y_i(\boldsymbol{x}_i^T\boldsymbol{w} + b)) - \boldsymbol{\alpha}^T\boldsymbol{\xi} \\ &= \frac{1}{2}\Vert \boldsymbol{w}\Vert_2^2 - \left(\sum_{i = 1}^ny_i \mu_i \boldsymbol{x}_i\right)^T \boldsymbol{w} + (C\boldsymbol{1} - \boldsymbol{\mu} - \boldsymbol{\alpha})^T \boldsymbol{\xi} - \left(\boldsymbol{\mu}^T\boldsymbol{y}\right)b + \boldsymbol{1}^T\boldsymbol{\mu} \end{aligned} \]

minimize it,

\[ \phi(\boldsymbol{\mu}, \boldsymbol{\alpha}) = \begin{cases} \boldsymbol{1}^T\boldsymbol{\mu} - \dfrac{1}{2}\sum_{i, j}\mu_i\mu_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j & \boldsymbol{\mu}^T\boldsymbol{y} = 0, C\boldsymbol{1} - \boldsymbol{\mu} - \boldsymbol{\alpha} = 0 \\ -\infty & \text{otherwise} \end{cases} \]

So the dual is

\[ \begin{aligned} \max_{\boldsymbol{\mu}}\ & \boldsymbol{1}^T\boldsymbol{\mu} - \dfrac{1}{2}\sum_{i, j}\mu_i\mu_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\ & \boldsymbol{\mu}^T\boldsymbol{y} = 0 \\ & \boldsymbol{0} \leq \boldsymbol{\mu} \leq C\boldsymbol{1} \end{aligned} \]

Revisit KKT condition

KKT condition holds iff
1. Strong duality holds, \(f^* = \phi^*\)
2. \(\boldsymbol{x}^*\) is the primal optimal solution, \(f^* = f(\boldsymbol{x}^*)\)
3. \((\boldsymbol{\lambda}^*, \boldsymbol{\mu}^*)\) is a dual optimal solution, \(\phi^* = \phi(\boldsymbol{\lambda}^*, \boldsymbol{\mu}^*)\)