tensorflow 学习笔记(一)

文章目录[x]
  1. 1:损失函数
  2. 1.1:均方误差
  3. 1.2:自定义
  4. 1.3:交叉熵
  5. 2:学习率
  6. 2.1:梯度下降
  7. 2.2:指数衰减学习率
  8. 3:滑动平均
  9. 4:正则化
  10. 4.1:过拟合
  11. 4.2:正则化
  12. 4.3:正则化计算方法
  13. 5:搭建神经网络的八股
  14. 5.1:生成数据集(generateds.py)
  15. 5.2:前向传播(forward.py)
  16. 5.3:反向传播(backward.py)

损失函数

损失函数用来表示预测值与已知答案的差距

有多种表示方法

均方误差

n个样本的预测值与已知答案之差的平方的和的平均值

函数:

loss_mse=tf.reduce_mean(tf.square(y_-y))

示例:

import tensorflow as tf
import numpy as np

BATCH_SIZE=8    #每轮训练的数据量
SEED=23455      #随机种子

rdm=np.random.RandomState(SEED)    #生成随机数据
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X]

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)

loss_mse=tf.reduce_mean(tf.square(y_ - y))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run(init_op)
    STEPS=20000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end=start+BATCH_SIZE
        sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})
        if i%500==0:
            print("STEPS:",i)
            print("w1:",sess.run(w1))
    print("Final w1:",sess.run(w1))

结果:

STEPS: 0
w1: [[-0.80974597]
 [ 1.4852903 ]]
STEPS: 500
w1: [[-0.46074435]
 [ 1.641878  ]]
STEPS: 1000
w1: [[-0.21939856]
 [ 1.6984766 ]]
STEPS: 1500
w1: [[-0.04415595]
 [ 1.7003176 ]]
STEPS: 2000
w1: [[0.08942621]
 [1.673328  ]]
STEPS: 2500
w1: [[0.19583555]
 [1.6322677 ]]
STEPS: 3000
w1: [[0.28375748]
 [1.5854434 ]]
STEPS: 3500
w1: [[0.35848638]
 [1.5374471 ]]
STEPS: 4000
w1: [[0.4233252]
 [1.4907392]]
STEPS: 4500
w1: [[0.48040032]
 [1.4465573 ]]
STEPS: 5000
w1: [[0.5311361]
 [1.4054534]]
STEPS: 5500
w1: [[0.57653254]
 [1.367594  ]]
STEPS: 6000
w1: [[0.6173259]
 [1.3329402]]
STEPS: 6500
w1: [[0.65408474]
 [1.3013425 ]]
STEPS: 7000
w1: [[0.68726856]
 [1.2726018 ]]
STEPS: 7500
w1: [[0.7172598]
 [1.2465004]]
STEPS: 8000
w1: [[0.74438614]
 [1.2228196 ]]
STEPS: 8500
w1: [[0.7689325]
 [1.2013482]]
STEPS: 9000
w1: [[0.79115146]
 [1.1818888 ]]
STEPS: 9500
w1: [[0.81126714]
 [1.1642567 ]]
STEPS: 10000
w1: [[0.8294814]
 [1.1482829]]
STEPS: 10500
w1: [[0.84597576]
 [1.1338125 ]]
STEPS: 11000
w1: [[0.8609128]
 [1.1207061]]
STEPS: 11500
w1: [[0.87444043]
 [1.1088346 ]]
STEPS: 12000
w1: [[0.88669145]
 [1.0980824 ]]
STEPS: 12500
w1: [[0.8977863]
 [1.0883439]]
STEPS: 13000
w1: [[0.9078348]
 [1.0795243]]
STEPS: 13500
w1: [[0.91693527]
 [1.0715363 ]]
STEPS: 14000
w1: [[0.92517716]
 [1.0643018 ]]
STEPS: 14500
w1: [[0.93264157]
 [1.0577497 ]]
STEPS: 15000
w1: [[0.9394023]
 [1.0518153]]
STEPS: 15500
w1: [[0.9455251]
 [1.0464406]]
STEPS: 16000
w1: [[0.95107025]
 [1.0415728 ]]
STEPS: 16500
w1: [[0.9560928]
 [1.037164 ]]
STEPS: 17000
w1: [[0.96064115]
 [1.0331714 ]]
STEPS: 17500
w1: [[0.96476096]
 [1.0295546 ]]
STEPS: 18000
w1: [[0.9684917]
 [1.0262802]]
STEPS: 18500
w1: [[0.9718707]
 [1.0233142]]
STEPS: 19000
w1: [[0.974931 ]
 [1.0206276]]
STEPS: 19500
w1: [[0.9777026]
 [1.0181949]]
Final w1: [[0.98019385]
 [1.0159807 ]]

自定义

根据实际情况可制定合理的损失函数

交叉熵

表示两个概率分布之间的距离,交叉熵越大,两个概率分布越远,反之越近

学习率

学习率用来表示每次参数更新的幅度大小

梯度下降

在训练的过程中,参数的更新向着损失函数梯度下降的方向

下面是学习率为0.2的例子(损失函数为(w+1)2,所以最终目标w为-1,即loss=0)

import tensorflow as tf
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss=tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(0.2).minimize(loss)
with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))

结果:

After 0 STEPS,w is 2.600000, loss is 12.959999
After 1 STEPS,w is 1.160000, loss is 4.665599
After 2 STEPS,w is 0.296000, loss is 1.679616
After 3 STEPS,w is -0.222400, loss is 0.604662
After 4 STEPS,w is -0.533440, loss is 0.217678
After 5 STEPS,w is -0.720064, loss is 0.078364
After 6 STEPS,w is -0.832038, loss is 0.028211
After 7 STEPS,w is -0.899223, loss is 0.010156
After 8 STEPS,w is -0.939534, loss is 0.003656
After 9 STEPS,w is -0.963720, loss is 0.001316
After 10 STEPS,w is -0.978232, loss is 0.000474
After 11 STEPS,w is -0.986939, loss is 0.000171
After 12 STEPS,w is -0.992164, loss is 0.000061
After 13 STEPS,w is -0.995298, loss is 0.000022
After 14 STEPS,w is -0.997179, loss is 0.000008
After 15 STEPS,w is -0.998307, loss is 0.000003
After 16 STEPS,w is -0.998984, loss is 0.000001
After 17 STEPS,w is -0.999391, loss is 0.000000
After 18 STEPS,w is -0.999634, loss is 0.000000
After 19 STEPS,w is -0.999781, loss is 0.000000
After 20 STEPS,w is -0.999868, loss is 0.000000
After 21 STEPS,w is -0.999921, loss is 0.000000
After 22 STEPS,w is -0.999953, loss is 0.000000
After 23 STEPS,w is -0.999972, loss is 0.000000
After 24 STEPS,w is -0.999983, loss is 0.000000
After 25 STEPS,w is -0.999990, loss is 0.000000
After 26 STEPS,w is -0.999994, loss is 0.000000
After 27 STEPS,w is -0.999996, loss is 0.000000
After 28 STEPS,w is -0.999998, loss is 0.000000
After 29 STEPS,w is -0.999999, loss is 0.000000
After 30 STEPS,w is -0.999999, loss is 0.000000
After 31 STEPS,w is -1.000000, loss is 0.000000
After 32 STEPS,w is -1.000000, loss is 0.000000
After 33 STEPS,w is -1.000000, loss is 0.000000
After 34 STEPS,w is -1.000000, loss is 0.000000
After 35 STEPS,w is -1.000000, loss is 0.000000
After 36 STEPS,w is -1.000000, loss is 0.000000
After 37 STEPS,w is -1.000000, loss is 0.000000
After 38 STEPS,w is -1.000000, loss is 0.000000
After 39 STEPS,w is -1.000000, loss is 0.000000

下面是学习率为1的例子

import tensorflow as tf
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss=tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(1).minimize(loss)
with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))

结果:

After 0 STEPS,w is -7.000000, loss is 36.000000
After 1 STEPS,w is 5.000000, loss is 36.000000
After 2 STEPS,w is -7.000000, loss is 36.000000
After 3 STEPS,w is 5.000000, loss is 36.000000
After 4 STEPS,w is -7.000000, loss is 36.000000
After 5 STEPS,w is 5.000000, loss is 36.000000
After 6 STEPS,w is -7.000000, loss is 36.000000
After 7 STEPS,w is 5.000000, loss is 36.000000
After 8 STEPS,w is -7.000000, loss is 36.000000
After 9 STEPS,w is 5.000000, loss is 36.000000
After 10 STEPS,w is -7.000000, loss is 36.000000
After 11 STEPS,w is 5.000000, loss is 36.000000
After 12 STEPS,w is -7.000000, loss is 36.000000
After 13 STEPS,w is 5.000000, loss is 36.000000
After 14 STEPS,w is -7.000000, loss is 36.000000
After 15 STEPS,w is 5.000000, loss is 36.000000
After 16 STEPS,w is -7.000000, loss is 36.000000
After 17 STEPS,w is 5.000000, loss is 36.000000
After 18 STEPS,w is -7.000000, loss is 36.000000
After 19 STEPS,w is 5.000000, loss is 36.000000
After 20 STEPS,w is -7.000000, loss is 36.000000
After 21 STEPS,w is 5.000000, loss is 36.000000
After 22 STEPS,w is -7.000000, loss is 36.000000
After 23 STEPS,w is 5.000000, loss is 36.000000
After 24 STEPS,w is -7.000000, loss is 36.000000
After 25 STEPS,w is 5.000000, loss is 36.000000
After 26 STEPS,w is -7.000000, loss is 36.000000
After 27 STEPS,w is 5.000000, loss is 36.000000
After 28 STEPS,w is -7.000000, loss is 36.000000
After 29 STEPS,w is 5.000000, loss is 36.000000
After 30 STEPS,w is -7.000000, loss is 36.000000
After 31 STEPS,w is 5.000000, loss is 36.000000
After 32 STEPS,w is -7.000000, loss is 36.000000
After 33 STEPS,w is 5.000000, loss is 36.000000
After 34 STEPS,w is -7.000000, loss is 36.000000
After 35 STEPS,w is 5.000000, loss is 36.000000
After 36 STEPS,w is -7.000000, loss is 36.000000
After 37 STEPS,w is 5.000000, loss is 36.000000
After 38 STEPS,w is -7.000000, loss is 36.000000
After 39 STEPS,w is 5.000000, loss is 36.000000

下面是学习率为0.001的例子

import tensorflow as tf
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss=tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss)

with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print("After %s STEPS,w is %f, loss is %f"%(i,w_val,loss_val))

结果:

After 0 STEPS,w is 4.988000, loss is 35.856144
After 1 STEPS,w is 4.976024, loss is 35.712864
After 2 STEPS,w is 4.964072, loss is 35.570156
After 3 STEPS,w is 4.952144, loss is 35.428020
After 4 STEPS,w is 4.940240, loss is 35.286449
After 5 STEPS,w is 4.928360, loss is 35.145447
After 6 STEPS,w is 4.916503, loss is 35.005009
After 7 STEPS,w is 4.904670, loss is 34.865124
After 8 STEPS,w is 4.892860, loss is 34.725803
After 9 STEPS,w is 4.881075, loss is 34.587044
After 10 STEPS,w is 4.869313, loss is 34.448833
After 11 STEPS,w is 4.857574, loss is 34.311172
After 12 STEPS,w is 4.845859, loss is 34.174068
After 13 STEPS,w is 4.834167, loss is 34.037510
After 14 STEPS,w is 4.822499, loss is 33.901497
After 15 STEPS,w is 4.810854, loss is 33.766029
After 16 STEPS,w is 4.799233, loss is 33.631104
After 17 STEPS,w is 4.787634, loss is 33.496712
After 18 STEPS,w is 4.776059, loss is 33.362858
After 19 STEPS,w is 4.764507, loss is 33.229538
After 20 STEPS,w is 4.752978, loss is 33.096756
After 21 STEPS,w is 4.741472, loss is 32.964497
After 22 STEPS,w is 4.729989, loss is 32.832775
After 23 STEPS,w is 4.718529, loss is 32.701576
After 24 STEPS,w is 4.707092, loss is 32.570904
After 25 STEPS,w is 4.695678, loss is 32.440750
After 26 STEPS,w is 4.684287, loss is 32.311119
After 27 STEPS,w is 4.672918, loss is 32.182003
After 28 STEPS,w is 4.661572, loss is 32.053402
After 29 STEPS,w is 4.650249, loss is 31.925320
After 30 STEPS,w is 4.638949, loss is 31.797745
After 31 STEPS,w is 4.627671, loss is 31.670683
After 32 STEPS,w is 4.616416, loss is 31.544128
After 33 STEPS,w is 4.605183, loss is 31.418077
After 34 STEPS,w is 4.593973, loss is 31.292530
After 35 STEPS,w is 4.582785, loss is 31.167484
After 36 STEPS,w is 4.571619, loss is 31.042938
After 37 STEPS,w is 4.560476, loss is 30.918892
After 38 STEPS,w is 4.549355, loss is 30.795341
After 39 STEPS,w is 4.538256, loss is 30.672281

从上面三个例子可以看出来,

学习率过大,会导致参数震荡,无法收敛

学习率过小,会倒是参数收敛过慢

所以学习率要选择合适的,可能需要多次实验

指数衰减学习率

指数衰减学习率,学习率随着训练轮数变化而动态更新

import tensorflow as tf

LEARNING_RATE_BASE=0.1 #最初学习率
LEARNING_RATE_DECAY=0.99 #学习率衰减率
LEARNING_RATE_STEP=1 #喂入多少轮数据更新一次学习率(一般为总样本数/BATCH_SIZE)

#运行了几轮BATCH_SIZE的计数器,初值为0,设为不被训练
global_step=tf.Variable(0,trainable=False)
#定义指数下降学习率
learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)
#定义待优化参数,初值为10
w=tf.Variable(tf.constant(5,dtype=tf.float32))
#定义损失函数loss
loss=tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
#生成会话,训练40轮
with tf.Session() as sess:
    init_op=tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        learning_rate_val=sess.run(learning_rate)
        global_step_val=sess.run(global_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print("After %s steps: global_step is %f,w is %f,learning rate is %f,loss is %f"%(i,global_step_val,w_val,learning_rate_val,loss_val))

结果:

After 0 steps: global_step is 1.000000,w is 3.800000,learning rate is 0.099000,loss is 23.040001
After 1 steps: global_step is 2.000000,w is 2.849600,learning rate is 0.098010,loss is 14.819419
After 2 steps: global_step is 3.000000,w is 2.095001,learning rate is 0.097030,loss is 9.579033
After 3 steps: global_step is 4.000000,w is 1.494386,learning rate is 0.096060,loss is 6.221961
After 4 steps: global_step is 5.000000,w is 1.015167,learning rate is 0.095099,loss is 4.060896
After 5 steps: global_step is 6.000000,w is 0.631886,learning rate is 0.094148,loss is 2.663051
After 6 steps: global_step is 7.000000,w is 0.324608,learning rate is 0.093207,loss is 1.754587
After 7 steps: global_step is 8.000000,w is 0.077684,learning rate is 0.092274,loss is 1.161403
After 8 steps: global_step is 9.000000,w is -0.121202,learning rate is 0.091352,loss is 0.772287
After 9 steps: global_step is 10.000000,w is -0.281761,learning rate is 0.090438,loss is 0.515867
After 10 steps: global_step is 11.000000,w is -0.411674,learning rate is 0.089534,loss is 0.346128
After 11 steps: global_step is 12.000000,w is -0.517024,learning rate is 0.088638,loss is 0.233266
After 12 steps: global_step is 13.000000,w is -0.602644,learning rate is 0.087752,loss is 0.157891
After 13 steps: global_step is 14.000000,w is -0.672382,learning rate is 0.086875,loss is 0.107334
After 14 steps: global_step is 15.000000,w is -0.729305,learning rate is 0.086006,loss is 0.073276
After 15 steps: global_step is 16.000000,w is -0.775868,learning rate is 0.085146,loss is 0.050235
After 16 steps: global_step is 17.000000,w is -0.814036,learning rate is 0.084294,loss is 0.034583
After 17 steps: global_step is 18.000000,w is -0.845387,learning rate is 0.083451,loss is 0.023905
After 18 steps: global_step is 19.000000,w is -0.871193,learning rate is 0.082617,loss is 0.016591
After 19 steps: global_step is 20.000000,w is -0.892476,learning rate is 0.081791,loss is 0.011561
After 20 steps: global_step is 21.000000,w is -0.910065,learning rate is 0.080973,loss is 0.008088
After 21 steps: global_step is 22.000000,w is -0.924629,learning rate is 0.080163,loss is 0.005681
After 22 steps: global_step is 23.000000,w is -0.936713,learning rate is 0.079361,loss is 0.004005
After 23 steps: global_step is 24.000000,w is -0.946758,learning rate is 0.078568,loss is 0.002835
After 24 steps: global_step is 25.000000,w is -0.955125,learning rate is 0.077782,loss is 0.002014
After 25 steps: global_step is 26.000000,w is -0.962106,learning rate is 0.077004,loss is 0.001436
After 26 steps: global_step is 27.000000,w is -0.967942,learning rate is 0.076234,loss is 0.001028
After 27 steps: global_step is 28.000000,w is -0.972830,learning rate is 0.075472,loss is 0.000738
After 28 steps: global_step is 29.000000,w is -0.976931,learning rate is 0.074717,loss is 0.000532
After 29 steps: global_step is 30.000000,w is -0.980378,learning rate is 0.073970,loss is 0.000385
After 30 steps: global_step is 31.000000,w is -0.983281,learning rate is 0.073230,loss is 0.000280
After 31 steps: global_step is 32.000000,w is -0.985730,learning rate is 0.072498,loss is 0.000204
After 32 steps: global_step is 33.000000,w is -0.987799,learning rate is 0.071773,loss is 0.000149
After 33 steps: global_step is 34.000000,w is -0.989550,learning rate is 0.071055,loss is 0.000109
After 34 steps: global_step is 35.000000,w is -0.991035,learning rate is 0.070345,loss is 0.000080
After 35 steps: global_step is 36.000000,w is -0.992297,learning rate is 0.069641,loss is 0.000059
After 36 steps: global_step is 37.000000,w is -0.993369,learning rate is 0.068945,loss is 0.000044
After 37 steps: global_step is 38.000000,w is -0.994284,learning rate is 0.068255,loss is 0.000033
After 38 steps: global_step is 39.000000,w is -0.995064,learning rate is 0.067573,loss is 0.000024
After 39 steps: global_step is 40.000000,w is -0.995731,learning rate is 0.066897,loss is 0.000018

滑动平均

影子=衰减率*影子+(1-衰减率)*参数

其中衰减率为min{MOVING_AVERAGE_DECAY,(1+轮数)/(10+轮数)}

详解请见Moving Averages 滑动平均的原理和直观感知

正则化

过拟合

神经网络模型在训练集上准确率较高,在新的数据进行预测或分类时准确率较低,说明模型的泛化能力差

正则化

在损失函数中给每个参数w加上权重,引入模型复杂度指标,从而抑制模型噪声,减小过拟合

使用正则化后,损失函数loss变为两项之和:

loss = loss(y与y_) +REGULARIZER * loss(w)

其中,第一项是预测结果与标准答案直接的差距,比如交叉熵、均方误差等;第二项是正则化计算结果

正则化计算方法

L1正则化

用Tensorflow函数来表示:

loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w)

L2正则化

用Tensorflow函数来表示:

loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w)

用Tensorflow函数实现正则化:

tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(REGULARIER)(w))
loss = cem +tf.add_to_n(tf.get_collection('losses'))

搭建神经网络的八股

生成数据集(generateds.py)

前向传播(forward.py)

前向传播就是搭建网络,设计网络结构

def forward(x,regularizer):
    w =
    b =
    y =
    return y

def get_weight(shape,regularizer):
    w = tf.Variable()
    tf.add_to_collection('losses',tf.contrib.layers.l1_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b = tf.Variable()
    return b

反向传播(backward.py)

反向传播就是训练网络,优化网络参数

def backward():
    x=tf.placeholder( )
    y_=tf.placeholder( )
    y=forward.forward(x,REGULARIZER)
    global_step=tf.Variable(0,trainable=False)

    #均方误差
    loss_mse=tf.resuce_mean(tf.square(y-y_))

    #交叉熵
    ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
    cem=tf.reduce_mean(ce)

    #正则化
    loss=loss+tf.add_n(tf.get_collection('losses'))
    
    #指数衰减学习率
    learning_rate=tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        数据集总样本数/BATCH_SIZE,
        LEARNING_RATE_DECAY,
        staircase=True)

    train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

    #滑动平均
    ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
    ema_op=ema.apply(tf.trainable_variables())
    with tf.control_dependencies([global_step,ema_op]):
        train_op=tf.no_op(name='train')

    with tf.Session() as sess:
        init_op=tf.global_variables_initializer()
        sess.run(init_op)

        for i in range(STEPS):
            sess.run(train_step,feed_dict={x:,y_:})
            if i % 轮数 == 0:
                print()

if __name__=='__main__':
    backward()

 

点赞

发表评论