反向传播(英语:Backpropagation,缩写为 BP )是“误差反向传播”的简称,是一种与最优化方法(如梯度下降法)结合使用的,用来训练人工神经网络的常见方法。该方法对网络中所有权重计算损失函数的梯度。这个梯度会反馈给最优化方法,用来更新权值以最小化损失函数。
假设,你有这样一个网络层
第一层是输入层,包含两个神经元 $i1$
,$i2$
,和截距项$b1$
;第二层是隐含层,包含两个神经元$h1$
,$h2$
和截距项$b2$
,第三层是输出$o1$
,$o2$
,每条线上标的$wi$
是层与层之间连接的权重,激活函数我们默认为 sigmoid 函数。
现在对他们赋上初值,如下图:
其中,
输入数据 $i1=0.05$
,$i2=0.10$
;
输出数据 $o1=0.01$
,$o2=0.99$
;
初始权重
$w1=0.15$
,$w2=0.20$
, $w3=0.25$
,$w4=0.30$
;
$w5=0.40$
,$w6=0.45$
, $w7=0.50$
,$w8=0.55$
;
目标:给出输入数据$i1$
,$i2$
(0.05 和 0.10),使输出尽可能与原始输出$o1$
,$o2$
(0.01 和 0.99)接近。
计算神经元$h1$
的输入加权和:
net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1
net_{h1} = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775
计算神经元$h1$
的输出$o1$
:(此处用到激活函数为 sigmoid 函数)
out_{h1} = \frac{1}{1+e^{-net_{h1}}} = 0.5932
同理,可计算神经元 $h2$
的输出 $o2$
out_{h2} = 0.5968
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
out_{o1} = \frac{1}{1+e^{-net_{o1}}} = 0.7514
同样的,计算神经元 o2 的输出
out_{o2} = 0.7730
接下来,就可以进行反向传播的计算了
E_{total} = E_{o1} + E_{o2}
分别计算$o1$
,$o2$
的误差
E_{o1} = \frac{1}{2} (target_{o1} - out_{o1})^2 = 0.2748
E_{o2} = \frac{1}{2} (target_{o2} - out_{o2})^2 = 0.0235
E_{total} = E_{o1} + E_{o2} = 0.2983
以权重参数$w5$
为例,如果我们想知道$w5$
对整体误差产生了多少影响,可以用整体误差对$w5$
求偏导求出(链式法则)
\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})} + \frac {\partial (net_{o1} )}{\partial (w_{5})}
下面的图可以更直观的看清楚误差是怎样反向传播的
我们分别计算每个式子的值:
计算 $\frac {\partial (E_{total} )}{\partial (out_{o1})}$
E_{total} = \frac {1}{2}(target_{o1} - out_{o1} )^2 +\frac {1}{2}(target_{o2} - out_{o2} )^2
\frac {\partial (E_{total} )}{\partial (out_{o1})} = - (target_{o1} - out{o1} ) = 0.7414
计算 $ \frac {\partial ( out_{o1} )}{\partial (net_{o1})} $
out_{o1} = \frac{1}{1+e^{-net_{o1}}}
\frac {\partial ( out_{o1} )}{\partial (net_{o1})} = out_{o1}(1 - out_{o1} ) = 0.1868
计算 $ \frac {\partial ( net_{o1} )}{\partial (w_{5})}$
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
\frac {\partial ( net_{o1} )}{\partial (w_{5})} = out_{h1} = 0.5932
最后三者相乘
\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (w_{5})} = 0.082
看看上面的公式,我们发现:
\frac {\partial (E_{total} )}{\partial (w_{5})} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})*out_{h1}
为了表达方便,用$\delta _{o1}$
来表示输出层的误差
\delta _{o1} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})}
\delta _{o1} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})
\frac {\partial (E_{total} )}{\partial (w_{5})} = \delta _{o1} *out_{h1}
更新$w_5$
的值:
w_5^+ = w_5 - \eta * \frac {\partial (E_{total} )}{\partial (w_{5})} = 0.3589
同理,更新 $w_6$
,$w_7$
,$w_8$
w_6^+ = 0.4086
w_7^+ = 0.5113
w_8^+ = 0.5614
我们可以依照上述的方法计算 $w_1$
, $w_2$
, $w_3$
, $w_4$
,方法其实与上面说的差不多,但是有个地方需要变一下。
在上文计算总误差对 w5 的偏导时,是从:
$out_{o1}$
-> $net_{o1}$
-> $w_5$
但是在隐含层之间的权值更新时,是从:
$out_{h1}$
-> $net_{h1}$
-> $w_1$
计算 $\frac {\partial (E_{total} )}{\partial (out_{h1})}$
\frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})}
先计算$\frac {\partial (E_{o1} )}{\partial (out_{h1})}$
\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})}
\frac {\partial (E_{o1} )}{\partial (net_{o1})} = \frac {\partial (E_{o1} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} = 0.1385
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
\frac {\partial (net_{o1} )}{\partial (out_{h1})} = w_5= 0.40
\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})} = 0.138 * 0.4 = 0.055
同理,计算出
\frac {\partial (E_{o2} )}{\partial (out_{h1})} = -0.019
两者相加,得到总值
\frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})} = 0.036
再计算 $\frac {\partial (out_{h1} )}{\partial (net_{h1})}$
out_{h1} = \frac{1}{1+e^{-net_{h1}}}
\frac {\partial (out_{h1} )}{\partial (net_{h1})} = out_{h1} *(1-out_{h1}) = 0.2413
再计算$ \frac {\partial (net_{h1} )}{\partial (w_{1})} $
net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1
\frac {\partial (net_{h1} )}{\partial (w_{1})} = i_1 =0.05
最后,三者相乘
\frac {\partial (E_{total} )}{\partial (w_{1})} = \frac {\partial (E_{total} )}{\partial (out_{h1})} * \frac {\partial (out_{h1} )}{\partial (net_{h1})} * \frac {\partial (net_{h1} )}{\partial (w_{1})}
\frac {\partial (E_{total} )}{\partial (w_{1})} = 0.036 * 0.2413 * 0.05 = 0.000438
我们更新$w_1$
的值
w_1^+ = w_1 - \eta * \frac {\partial (E_{total} )}{\partial (w_{1})} = 0.1498
同理,更新 $w_2$
,$w_3$
,$w_4$
w_2^+ = 0.1996
w_3^+ = 0.2498
w_4^+ = 0.2995
这样误差反向传播法就完成了,最后我们再把更新的权值重新计算,不停地迭代.
完整代码( PC 端查看): http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app
—————————————————————————————————————————————————————————————————————— Mo (网址:momodel.cn )是一个支持 Python 的人工智能在线建模平台,能帮助你快速开发训练并部署 AI 应用。期待你的加入。
1
nical 2019-01-21 19:23:54 +08:00
厉害了,很有帮助
|
2
MoModel OP @nical 不好意思很多公式都乱码了,请直接用 PC 端打开 http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app 查看源码
|
3
MoModel OP |