文章目录[x]
- 1:损失函数
- 1.1:均方误差
- 1.2:自定义
- 1.3:交叉熵
- 2:学习率
- 2.1:梯度下降
- 2.2:指数衰减学习率
- 3:滑动平均
- 4:正则化
- 4.1:过拟合
- 4.2:正则化
- 4.3:正则化计算方法
- 5:搭建神经网络的八股
- 5.1:生成数据集(generateds.py)
- 5.2:前向传播(forward.py)
- 5.3:反向传播(backward.py)
损失函数
损失函数用来表示预测值与已知答案的差距
有多种表示方法
均方误差
n个样本的预测值与已知答案之差的平方的和的平均值
函数:
loss_mse=tf.reduce_mean(tf.square(y_-y))
示例:
import tensorflow as tf import numpy as np BATCH_SIZE=8 #每轮训练的数据量 SEED=23455 #随机种子 rdm=np.random.RandomState(SEED) #生成随机数据 X=rdm.rand(32,2) Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X] x=tf.placeholder(tf.float32,shape=(None,2)) y_=tf.placeholder(tf.float32,shape=(None,1)) w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1)) y=tf.matmul(x,w1) loss_mse=tf.reduce_mean(tf.square(y_ - y)) train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse) with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) STEPS=20000 for i in range(STEPS): start=(i*BATCH_SIZE)%32 end=start+BATCH_SIZE sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]}) if i%500==0: print("STEPS:",i) print("w1:",sess.run(w1)) print("Final w1:",sess.run(w1))
结果:
STEPS: 0 w1: [[-0.80974597] [ 1.4852903 ]] STEPS: 500 w1: [[-0.46074435] [ 1.641878 ]] STEPS: 1000 w1: [[-0.21939856] [ 1.6984766 ]] STEPS: 1500 w1: [[-0.04415595] [ 1.7003176 ]] STEPS: 2000 w1: [[0.08942621] [1.673328 ]] STEPS: 2500 w1: [[0.19583555] [1.6322677 ]] STEPS: 3000 w1: [[0.28375748] [1.5854434 ]] STEPS: 3500 w1: [[0.35848638] [1.5374471 ]] STEPS: 4000 w1: [[0.4233252] [1.4907392]] STEPS: 4500 w1: [[0.48040032] [1.4465573 ]] STEPS: 5000 w1: [[0.5311361] [1.4054534]] STEPS: 5500 w1: [[0.57653254] [1.367594 ]] STEPS: 6000 w1: [[0.6173259] [1.3329402]] STEPS: 6500 w1: [[0.65408474] [1.3013425 ]] STEPS: 7000 w1: [[0.68726856] [1.2726018 ]] STEPS: 7500 w1: [[0.7172598] [1.2465004]] STEPS: 8000 w1: [[0.74438614] [1.2228196 ]] STEPS: 8500 w1: [[0.7689325] [1.2013482]] STEPS: 9000 w1: [[0.79115146] [1.1818888 ]] STEPS: 9500 w1: [[0.81126714] [1.1642567 ]] STEPS: 10000 w1: [[0.8294814] [1.1482829]] STEPS: 10500 w1: [[0.84597576] [1.1338125 ]] STEPS: 11000 w1: [[0.8609128] [1.1207061]] STEPS: 11500 w1: [[0.87444043] [1.1088346 ]] STEPS: 12000 w1: [[0.88669145] [1.0980824 ]] STEPS: 12500 w1: [[0.8977863] [1.0883439]] STEPS: 13000 w1: [[0.9078348] [1.0795243]] STEPS: 13500 w1: [[0.91693527] [1.0715363 ]] STEPS: 14000 w1: [[0.92517716] [1.0643018 ]] STEPS: 14500 w1: [[0.93264157] [1.0577497 ]] STEPS: 15000 w1: [[0.9394023] [1.0518153]] STEPS: 15500 w1: [[0.9455251] [1.0464406]] STEPS: 16000 w1: [[0.95107025] [1.0415728 ]] STEPS: 16500 w1: [[0.9560928] [1.037164 ]] STEPS: 17000 w1: [[0.96064115] [1.0331714 ]] STEPS: 17500 w1: [[0.96476096] [1.0295546 ]] STEPS: 18000 w1: [[0.9684917] [1.0262802]] STEPS: 18500 w1: [[0.9718707] [1.0233142]] STEPS: 19000 w1: [[0.974931 ] [1.0206276]] STEPS: 19500 w1: [[0.9777026] [1.0181949]] Final w1: [[0.98019385] [1.0159807 ]]
自定义
根据实际情况可制定合理的损失函数
交叉熵
表示两个概率分布之间的距离,交叉熵越大,两个概率分布越远,反之越近
学习率
学习率用来表示每次参数更新的幅度大小
梯度下降
在训练的过程中,参数的更新向着损失函数梯度下降的方向
下面是学习率为0.2的例子(损失函数为(w+1)2,所以最终目标w为-1,即loss=0)
import tensorflow as tf w=tf.Variable(tf.constant(5,dtype=tf.float32)) loss=tf.square(w+1) train_step=tf.train.GradientDescentOptimizer(0.2).minimize(loss) with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val=sess.run(w) loss_val=sess.run(loss) print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))
结果:
After 0 STEPS,w is 2.600000, loss is 12.959999 After 1 STEPS,w is 1.160000, loss is 4.665599 After 2 STEPS,w is 0.296000, loss is 1.679616 After 3 STEPS,w is -0.222400, loss is 0.604662 After 4 STEPS,w is -0.533440, loss is 0.217678 After 5 STEPS,w is -0.720064, loss is 0.078364 After 6 STEPS,w is -0.832038, loss is 0.028211 After 7 STEPS,w is -0.899223, loss is 0.010156 After 8 STEPS,w is -0.939534, loss is 0.003656 After 9 STEPS,w is -0.963720, loss is 0.001316 After 10 STEPS,w is -0.978232, loss is 0.000474 After 11 STEPS,w is -0.986939, loss is 0.000171 After 12 STEPS,w is -0.992164, loss is 0.000061 After 13 STEPS,w is -0.995298, loss is 0.000022 After 14 STEPS,w is -0.997179, loss is 0.000008 After 15 STEPS,w is -0.998307, loss is 0.000003 After 16 STEPS,w is -0.998984, loss is 0.000001 After 17 STEPS,w is -0.999391, loss is 0.000000 After 18 STEPS,w is -0.999634, loss is 0.000000 After 19 STEPS,w is -0.999781, loss is 0.000000 After 20 STEPS,w is -0.999868, loss is 0.000000 After 21 STEPS,w is -0.999921, loss is 0.000000 After 22 STEPS,w is -0.999953, loss is 0.000000 After 23 STEPS,w is -0.999972, loss is 0.000000 After 24 STEPS,w is -0.999983, loss is 0.000000 After 25 STEPS,w is -0.999990, loss is 0.000000 After 26 STEPS,w is -0.999994, loss is 0.000000 After 27 STEPS,w is -0.999996, loss is 0.000000 After 28 STEPS,w is -0.999998, loss is 0.000000 After 29 STEPS,w is -0.999999, loss is 0.000000 After 30 STEPS,w is -0.999999, loss is 0.000000 After 31 STEPS,w is -1.000000, loss is 0.000000 After 32 STEPS,w is -1.000000, loss is 0.000000 After 33 STEPS,w is -1.000000, loss is 0.000000 After 34 STEPS,w is -1.000000, loss is 0.000000 After 35 STEPS,w is -1.000000, loss is 0.000000 After 36 STEPS,w is -1.000000, loss is 0.000000 After 37 STEPS,w is -1.000000, loss is 0.000000 After 38 STEPS,w is -1.000000, loss is 0.000000 After 39 STEPS,w is -1.000000, loss is 0.000000
下面是学习率为1的例子
import tensorflow as tf w=tf.Variable(tf.constant(5,dtype=tf.float32)) loss=tf.square(w+1) train_step=tf.train.GradientDescentOptimizer(1).minimize(loss) with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val=sess.run(w) loss_val=sess.run(loss) print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))
结果:
After 0 STEPS,w is -7.000000, loss is 36.000000 After 1 STEPS,w is 5.000000, loss is 36.000000 After 2 STEPS,w is -7.000000, loss is 36.000000 After 3 STEPS,w is 5.000000, loss is 36.000000 After 4 STEPS,w is -7.000000, loss is 36.000000 After 5 STEPS,w is 5.000000, loss is 36.000000 After 6 STEPS,w is -7.000000, loss is 36.000000 After 7 STEPS,w is 5.000000, loss is 36.000000 After 8 STEPS,w is -7.000000, loss is 36.000000 After 9 STEPS,w is 5.000000, loss is 36.000000 After 10 STEPS,w is -7.000000, loss is 36.000000 After 11 STEPS,w is 5.000000, loss is 36.000000 After 12 STEPS,w is -7.000000, loss is 36.000000 After 13 STEPS,w is 5.000000, loss is 36.000000 After 14 STEPS,w is -7.000000, loss is 36.000000 After 15 STEPS,w is 5.000000, loss is 36.000000 After 16 STEPS,w is -7.000000, loss is 36.000000 After 17 STEPS,w is 5.000000, loss is 36.000000 After 18 STEPS,w is -7.000000, loss is 36.000000 After 19 STEPS,w is 5.000000, loss is 36.000000 After 20 STEPS,w is -7.000000, loss is 36.000000 After 21 STEPS,w is 5.000000, loss is 36.000000 After 22 STEPS,w is -7.000000, loss is 36.000000 After 23 STEPS,w is 5.000000, loss is 36.000000 After 24 STEPS,w is -7.000000, loss is 36.000000 After 25 STEPS,w is 5.000000, loss is 36.000000 After 26 STEPS,w is -7.000000, loss is 36.000000 After 27 STEPS,w is 5.000000, loss is 36.000000 After 28 STEPS,w is -7.000000, loss is 36.000000 After 29 STEPS,w is 5.000000, loss is 36.000000 After 30 STEPS,w is -7.000000, loss is 36.000000 After 31 STEPS,w is 5.000000, loss is 36.000000 After 32 STEPS,w is -7.000000, loss is 36.000000 After 33 STEPS,w is 5.000000, loss is 36.000000 After 34 STEPS,w is -7.000000, loss is 36.000000 After 35 STEPS,w is 5.000000, loss is 36.000000 After 36 STEPS,w is -7.000000, loss is 36.000000 After 37 STEPS,w is 5.000000, loss is 36.000000 After 38 STEPS,w is -7.000000, loss is 36.000000 After 39 STEPS,w is 5.000000, loss is 36.000000
下面是学习率为0.001的例子
import tensorflow as tf w=tf.Variable(tf.constant(5,dtype=tf.float32)) loss=tf.square(w+1) train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss) with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val=sess.run(w) loss_val=sess.run(loss) print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))
结果:
After 0 STEPS,w is 4.988000, loss is 35.856144 After 1 STEPS,w is 4.976024, loss is 35.712864 After 2 STEPS,w is 4.964072, loss is 35.570156 After 3 STEPS,w is 4.952144, loss is 35.428020 After 4 STEPS,w is 4.940240, loss is 35.286449 After 5 STEPS,w is 4.928360, loss is 35.145447 After 6 STEPS,w is 4.916503, loss is 35.005009 After 7 STEPS,w is 4.904670, loss is 34.865124 After 8 STEPS,w is 4.892860, loss is 34.725803 After 9 STEPS,w is 4.881075, loss is 34.587044 After 10 STEPS,w is 4.869313, loss is 34.448833 After 11 STEPS,w is 4.857574, loss is 34.311172 After 12 STEPS,w is 4.845859, loss is 34.174068 After 13 STEPS,w is 4.834167, loss is 34.037510 After 14 STEPS,w is 4.822499, loss is 33.901497 After 15 STEPS,w is 4.810854, loss is 33.766029 After 16 STEPS,w is 4.799233, loss is 33.631104 After 17 STEPS,w is 4.787634, loss is 33.496712 After 18 STEPS,w is 4.776059, loss is 33.362858 After 19 STEPS,w is 4.764507, loss is 33.229538 After 20 STEPS,w is 4.752978, loss is 33.096756 After 21 STEPS,w is 4.741472, loss is 32.964497 After 22 STEPS,w is 4.729989, loss is 32.832775 After 23 STEPS,w is 4.718529, loss is 32.701576 After 24 STEPS,w is 4.707092, loss is 32.570904 After 25 STEPS,w is 4.695678, loss is 32.440750 After 26 STEPS,w is 4.684287, loss is 32.311119 After 27 STEPS,w is 4.672918, loss is 32.182003 After 28 STEPS,w is 4.661572, loss is 32.053402 After 29 STEPS,w is 4.650249, loss is 31.925320 After 30 STEPS,w is 4.638949, loss is 31.797745 After 31 STEPS,w is 4.627671, loss is 31.670683 After 32 STEPS,w is 4.616416, loss is 31.544128 After 33 STEPS,w is 4.605183, loss is 31.418077 After 34 STEPS,w is 4.593973, loss is 31.292530 After 35 STEPS,w is 4.582785, loss is 31.167484 After 36 STEPS,w is 4.571619, loss is 31.042938 After 37 STEPS,w is 4.560476, loss is 30.918892 After 38 STEPS,w is 4.549355, loss is 30.795341 After 39 STEPS,w is 4.538256, loss is 30.672281
从上面三个例子可以看出来,
学习率过大,会导致参数震荡,无法收敛
学习率过小,会倒是参数收敛过慢
所以学习率要选择合适的,可能需要多次实验
指数衰减学习率
指数衰减学习率,学习率随着训练轮数变化而动态更新
import tensorflow as tf LEARNING_RATE_BASE=0.1 #最初学习率 LEARNING_RATE_DECAY=0.99 #学习率衰减率 LEARNING_RATE_STEP=1 #喂入多少轮数据更新一次学习率(一般为总样本数/BATCH_SIZE) #运行了几轮BATCH_SIZE的计数器,初值为0,设为不被训练 global_step=tf.Variable(0,trainable=False) #定义指数下降学习率 learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True) #定义待优化参数,初值为10 w=tf.Variable(tf.constant(5,dtype=tf.float32)) #定义损失函数loss loss=tf.square(w+1) train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step) #生成会话,训练40轮 with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) learning_rate_val=sess.run(learning_rate) global_step_val=sess.run(global_step) w_val=sess.run(w) loss_val=sess.run(loss) print("After %s steps: global_step is %f,w is %f,learning rate is %f,loss is %f"%(i,global_step_val,w_val,learning_rate_val,loss_val))
结果:
After 0 steps: global_step is 1.000000,w is 3.800000,learning rate is 0.099000,loss is 23.040001 After 1 steps: global_step is 2.000000,w is 2.849600,learning rate is 0.098010,loss is 14.819419 After 2 steps: global_step is 3.000000,w is 2.095001,learning rate is 0.097030,loss is 9.579033 After 3 steps: global_step is 4.000000,w is 1.494386,learning rate is 0.096060,loss is 6.221961 After 4 steps: global_step is 5.000000,w is 1.015167,learning rate is 0.095099,loss is 4.060896 After 5 steps: global_step is 6.000000,w is 0.631886,learning rate is 0.094148,loss is 2.663051 After 6 steps: global_step is 7.000000,w is 0.324608,learning rate is 0.093207,loss is 1.754587 After 7 steps: global_step is 8.000000,w is 0.077684,learning rate is 0.092274,loss is 1.161403 After 8 steps: global_step is 9.000000,w is -0.121202,learning rate is 0.091352,loss is 0.772287 After 9 steps: global_step is 10.000000,w is -0.281761,learning rate is 0.090438,loss is 0.515867 After 10 steps: global_step is 11.000000,w is -0.411674,learning rate is 0.089534,loss is 0.346128 After 11 steps: global_step is 12.000000,w is -0.517024,learning rate is 0.088638,loss is 0.233266 After 12 steps: global_step is 13.000000,w is -0.602644,learning rate is 0.087752,loss is 0.157891 After 13 steps: global_step is 14.000000,w is -0.672382,learning rate is 0.086875,loss is 0.107334 After 14 steps: global_step is 15.000000,w is -0.729305,learning rate is 0.086006,loss is 0.073276 After 15 steps: global_step is 16.000000,w is -0.775868,learning rate is 0.085146,loss is 0.050235 After 16 steps: global_step is 17.000000,w is -0.814036,learning rate is 0.084294,loss is 0.034583 After 17 steps: global_step is 18.000000,w is -0.845387,learning rate is 0.083451,loss is 0.023905 After 18 steps: global_step is 19.000000,w is -0.871193,learning rate is 0.082617,loss is 0.016591 After 19 steps: global_step is 20.000000,w is -0.892476,learning rate is 0.081791,loss is 0.011561 After 20 steps: global_step is 21.000000,w is -0.910065,learning rate is 0.080973,loss is 0.008088 After 21 steps: global_step is 22.000000,w is -0.924629,learning rate is 0.080163,loss is 0.005681 After 22 steps: global_step is 23.000000,w is -0.936713,learning rate is 0.079361,loss is 0.004005 After 23 steps: global_step is 24.000000,w is -0.946758,learning rate is 0.078568,loss is 0.002835 After 24 steps: global_step is 25.000000,w is -0.955125,learning rate is 0.077782,loss is 0.002014 After 25 steps: global_step is 26.000000,w is -0.962106,learning rate is 0.077004,loss is 0.001436 After 26 steps: global_step is 27.000000,w is -0.967942,learning rate is 0.076234,loss is 0.001028 After 27 steps: global_step is 28.000000,w is -0.972830,learning rate is 0.075472,loss is 0.000738 After 28 steps: global_step is 29.000000,w is -0.976931,learning rate is 0.074717,loss is 0.000532 After 29 steps: global_step is 30.000000,w is -0.980378,learning rate is 0.073970,loss is 0.000385 After 30 steps: global_step is 31.000000,w is -0.983281,learning rate is 0.073230,loss is 0.000280 After 31 steps: global_step is 32.000000,w is -0.985730,learning rate is 0.072498,loss is 0.000204 After 32 steps: global_step is 33.000000,w is -0.987799,learning rate is 0.071773,loss is 0.000149 After 33 steps: global_step is 34.000000,w is -0.989550,learning rate is 0.071055,loss is 0.000109 After 34 steps: global_step is 35.000000,w is -0.991035,learning rate is 0.070345,loss is 0.000080 After 35 steps: global_step is 36.000000,w is -0.992297,learning rate is 0.069641,loss is 0.000059 After 36 steps: global_step is 37.000000,w is -0.993369,learning rate is 0.068945,loss is 0.000044 After 37 steps: global_step is 38.000000,w is -0.994284,learning rate is 0.068255,loss is 0.000033 After 38 steps: global_step is 39.000000,w is -0.995064,learning rate is 0.067573,loss is 0.000024 After 39 steps: global_step is 40.000000,w is -0.995731,learning rate is 0.066897,loss is 0.000018
滑动平均
影子=衰减率*影子+(1-衰减率)*参数
其中衰减率为min{MOVING_AVERAGE_DECAY,(1+轮数)/(10+轮数)}
详解请见Moving Averages 滑动平均的原理和直观感知
正则化
过拟合
神经网络模型在训练集上准确率较高,在新的数据进行预测或分类时准确率较低,说明模型的泛化能力差
正则化
在损失函数中给每个参数w加上权重,引入模型复杂度指标,从而抑制模型噪声,减小过拟合
使用正则化后,损失函数loss变为两项之和:
loss = loss(y与y_) +REGULARIZER * loss(w)
其中,第一项是预测结果与标准答案直接的差距,比如交叉熵、均方误差等;第二项是正则化计算结果
正则化计算方法
L1正则化
用Tensorflow函数来表示:
loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w)
L2正则化
用Tensorflow函数来表示:
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w)
用Tensorflow函数实现正则化:
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(REGULARIER)(w)) loss = cem +tf.add_to_n(tf.get_collection('losses'))
搭建神经网络的八股
生成数据集(generateds.py)
前向传播(forward.py)
前向传播就是搭建网络,设计网络结构
def forward(x,regularizer): w = b = y = return y def get_weight(shape,regularizer): w = tf.Variable() tf.add_to_collection('losses',tf.contrib.layers.l1_regularizer(regularizer)(w)) return w def get_bias(shape): b = tf.Variable() return b
反向传播(backward.py)
反向传播就是训练网络,优化网络参数
def backward(): x=tf.placeholder( ) y_=tf.placeholder( ) y=forward.forward(x,REGULARIZER) global_step=tf.Variable(0,trainable=False) #均方误差 loss_mse=tf.resuce_mean(tf.square(y-y_)) #交叉熵 ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1)) cem=tf.reduce_mean(ce) #正则化 loss=loss+tf.add_n(tf.get_collection('losses')) #指数衰减学习率 learning_rate=tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, 数据集总样本数/BATCH_SIZE, LEARNING_RATE_DECAY, staircase=True) train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step) #滑动平均 ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step) ema_op=ema.apply(tf.trainable_variables()) with tf.control_dependencies([global_step,ema_op]): train_op=tf.no_op(name='train') with tf.Session() as sess: init_op=tf.global_variables_initializer() sess.run(init_op) for i in range(STEPS): sess.run(train_step,feed_dict={x:,y_:}) if i % 轮数 == 0: print() if __name__=='__main__': backward()