# 什么是熵，交叉熵, KL Divergence, JS Divergence, f Divergence?

# 熵

$H(p) = -\sum_i p(x_i) log_b p(x_i)$

# 交叉熵

$H(p,q) = -\sum_i p(x_i) log_b q(x_i)$

# KL Divergence

KL Divergence就是:

$D_{KL}(P||Q) = H(p,q) - H(p) = -\sum_i p(x_i) log_b \frac{q(x_i)}{p(x_i)}$

# JS Divergence

JS Divergence是KL Divergence的一种变形:

$D_{JS}(p \| q)=\frac{1}{2} D_{KL}\left(p \| \frac{p+q}{2}\right)+\frac{1}{2} D_{KL}\left(q \| \frac{p+q}{2}\right)$

• 值域范围 JS散度的值域范围是[0,1]，相同为0，相反则为1。相比较于KL，对相似度的判别更准确了。
• 对称性 即 $D_{JS}(p \| q)=D_{JS}(q \| p)$, 而对称能让散度度量更准确。

# f Divergence

$f$ Divergence 长下面这个样子:(这是f Divergence的定义)

$D_{f}(P \| Q)=\int_{x} q(x) f\left(\frac{p(x)}{q(x)}\right) d x$

1. $f$是凸函数,
2. $f(1)=0$

$f$取不同的函数，可以得到不同的Divergence:

$D_f(P||Q) = \int_{x} q(x) \frac{p(x)}{q(x)} \log \left(\frac{p(x)}{q(x)}\right) d x=\int_{x} p(x) \log \left(\frac{p(x)}{q(x)}\right) d x$

$D_{f}(P \| Q)=\int_{x} q(x)\left(-\log \left(\frac{p(x)}{q(x)}\right)\right) d x=\int_{x} q(x) \log \left(\frac{q(x)}{p(x)}\right) d x$

$D_{f}(P \| Q)=\int_{x}q(x)\left(\frac{p(x)}{q(x)}-1\right)^{2} d x=\int_{x} \frac{(p(x)-q(x))^{2}}{q(x)} d x$