A Quick Introduction to Neural Networks

An Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artificial Neural Networks have generated a lot of excitement in Machine Learning research and industry, thanks to many breakthrough results in speech recognition, computer vision and text processing. In this blog post we will try to develop an understanding of a particular type of Artificial Neural Network called the Multi Layer Perceptron.

A Single Neuron

The basic unit of computation in a neural network is the neuron, often called a node or unit. It receives input from some other nodes, or from an external source and computes an output. Each input has an associated weight (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function (defined below) to the weighted sum of its inputs as shown in Figure 1 below:

Screen Shot 2016-08-09 at 3.42.21 AM.png

Figure 1: a single neuron

The above network takes numerical inputs X1 and X2 and has weights w1 and w2 associated with those inputs. Additionally, there is another input 1 with weight b (called the Bias) associated with it. We will learn more details about role of the bias later.

The output Y from the neuron is computed as shown in the Figure 1. The function is non-linear and is called the Activation Function. The purpose of the activation function is to introduce non-linearity into the output of a neuron. This is important because most real world data is non linear and we want neurons to learn these non linear representations.

Every activation function (or non-linearity) takes a single number and performs a certain fixed mathematical operation on it [2]. There are several activation functions you may encounter in practice:

  • Sigmoid: takes a real-valued input and squashes it to range between 0 and 1

σ(x) = 1 / (1 + exp(−x))

  • tanh: takes a real-valued input and squashes it to the range [-1, 1]

tanh(x) = 2σ(2x) − 1

  • ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds it at zero (replaces negative values with zero)

f(x) = max(0, x)

The below figures [2]  show each of the above activation functions.

Screen Shot 2016-08-08 at 11.53.41 AMFigure 2: different activation functions

Importance of Bias: The main function of Bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives). See this link to learn more about the role of bias in a neuron.

Feedforward Neural Network

The feedforward neural network was the first and simplest type of artificial neural network devised [3]. It contains multiple neurons (nodes) arranged in layers. Nodes from adjacent layers have connections or edges between them. All these connections have weights associated with them.

An example of a feedforward neural network is shown in Figure 3.

Screen Shot 2016-08-09 at 4.19.50 AM.png

Figure 3: an example of feedforward neural network

A feedforward neural network can consist of three types of nodes:

  1. Input Nodes – The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes.
  2. Hidden Nodes – The Hidden nodes have no direct connection with the outside world (hence the name “hidden”). They perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While a feedforward network will only have a single input layer and a single output layer, it can have zero or multiple Hidden Layers.
  3. Output Nodes – The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world.

In a feedforward network, the information moves in only one direction – forward – from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network [3] (this property of feed forward networks is different from Recurrent Neural Networks in which the connections between the nodes form a cycle).

Two examples of feedforward networks are given below:

  1. Single Layer Perceptron – This is the simplest feedforward neural network [4] and does not contain any hidden layer. You can learn more about Single Layer Perceptrons in [4], [5], [6], [7].
  2. Multi Layer Perceptron – A Multi Layer Perceptron has one or more hidden layers. We will only discuss Multi Layer Perceptrons below since they are more useful than Single Layer Perceptons for practical applications today.

Multi Layer Perceptron

A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer).  While a single layer perceptron can only learn linear functions, a multi layer perceptron can also learn non – linear functions.

Figure 4 shows a multi layer perceptron with a single hidden layer. Note that all connections have weights associated with them, but only three weights (w0, w1, w2) are shown in the figure.

Input Layer: The Input layer has three nodes. The Bias node has a value of 1. The other two nodes take X1 and X2 as external inputs (which are numerical values depending upon the input dataset). As discussed above, no computation is performed in the Input layer, so the outputs from nodes in the Input layer are 1, X1 and X2 respectively, which are fed into the Hidden Layer.

Hidden Layer: The Hidden layer also has three nodes with the Bias node having an output of 1. The output of the other two nodes in the Hidden layer depends on the outputs from the Input layer (1, X1, X2) as well as the weights associated with the connections (edges). Figure 4 shows the output calculation for one of the hidden nodes (highlighted). Similarly, the output from other hidden node can be calculated. Remember that refers to the activation function. These outputs are then fed to the nodes in the Output layer.

ds.png

Figure 4: a multi layer perceptron having one hidden layer

Output Layer: The Output layer has two nodes which take inputs from the Hidden layer and perform similar computations as shown for the highlighted hidden node. The values calculated (Y1 and Y2) as a result of these computations act as outputs of the Multi Layer Perceptron.

Given a set of features X = (x1, x2, …) and a target y, a Multi Layer Perceptron can learn the relationship between the features and the target, for either classification or regression.

Lets take an example to understand Multi Layer Perceptrons better. Suppose we have the following student-marks dataset:

train.png

The two input columns show the number of hours the student has studied and the mid term marks obtained by the student. The Final Result column can have two values 1 or 0 indicating whether the student passed in the final term. For example, we can see that if the student studied 35 hours and had obtained 67 marks in the mid term, he / she ended up passing the final term.

Now, suppose, we want to predict whether a student studying 25 hours and having 70 marks in the mid term will pass the final term.

test.png

This is a binary classification problem where a multi layer perceptron can learn from the given examples (training data) and make an informed prediction given a new data point. We will see below how a multi layer perceptron learns such relationships.

Training our MLP: The Back-Propagation Algorithm

The process by which a Multi Layer Perceptron learns is called the Backpropagation algorithm. I would recommend reading this Quora answer by Hemanth Kumar (quoted below) which explains Backpropagation clearly.

Backward Propagation of Errors, often abbreviated as BackProp is one of the several ways in which an artificial neural network (ANN) can be trained. It is a supervised training scheme, which means, it learns from labeled training data (there is a supervisor, to guide its learning).

To put in simple terms, BackProp is like “learning from mistakes“. The supervisor corrects the ANN whenever it makes mistakes.

An ANN consists of nodes in different layers; input layer, intermediate hidden layer(s) and the output layer. The connections between nodes of adjacent layers have “weights” associated with them. The goal of learning is to assign correct weights for these edges. Given an input vector, these weights determine what the output vector is.

In supervised learning, the training set is labeled. This means, for some given inputs, we know the desired/expected output (label).

BackProp Algorithm:
Initially all the edge weights are randomly assigned. For every input in the training dataset, the ANN is activated and its output is observed. This output is compared with the desired output that we already know, and the error is “propagated” back to the previous layer. This error is noted and the weights are “adjusted” accordingly. This process is repeated until the output error is below a predetermined threshold.

Once the above algorithm terminates, we have a “learned” ANN which, we consider is ready to work with “new” inputs. This ANN is said to have learned from several examples (labeled data) and from its mistakes (error propagation).

Now that we have an idea of how Backpropagation works, lets come back to our student-marks dataset shown above.

The Multi Layer Perceptron shown in Figure 5 (adapted from Sebastian Raschka’s excellent visual explanation of the backpropagation algorithm) has two nodes in the input layer (apart from the Bias node) which take the inputs ‘Hours Studied’ and ‘Mid Term Marks’. It also has a hidden layer with two nodes (apart from the Bias node). The output layer has two nodes as well – the upper node outputs the probability of ‘Pass’ while the lower node outputs the probability of ‘Fail’.

In classification tasks, we generally use a Softmax function as the Activation Function in the Output layer of the Multi Layer Perceptron to ensure that the outputs are probabilities and they add up to 1. The Softmax function takes a vector of arbitrary real-valued scores and squashes it to a vector of values between zero and one that sum to one. So, in this case,

Probability (Pass) + Probability (Fail) = 1

Step 1: Forward Propagation

All weights in the network are randomly assigned. Lets consider the hidden layer node marked V in Figure 5 below. Assume the weights of the connections from the inputs to that node are w1, w2 and w3 (as shown).

The network then takes the first training example as input (we know that for inputs 35 and 67, the probability of Pass is 1).

  • Input to the network = [35, 67]
  • Desired output from the network (target) = [1, 0]

Then output V from the node in consideration can be calculated as below (is an activation function such as sigmoid):

V = (1*w1 + 35*w2 + 67*w3)

Similarly, outputs from the other node in the hidden layer is also calculated. The outputs of the two nodes in the hidden layer act as inputs to the two nodes in the output layer. This enables us to calculate output probabilities from the two nodes in output layer.

Suppose the output probabilities from the two nodes in the output layer are 0.4 and 0.6 respectively (since the weights are randomly assigned, outputs will also be random). We can see that the calculated probabilities (0.4 and 0.6) are very far from the desired probabilities (1 and 0 respectively), hence the network in Figure 5 is said to have an ‘Incorrect Output’.

Screen Shot 2016-08-09 at 11.52.57 PM.png

Figure 5: forward propagation step in a multi layer perceptron

Step 2: Back Propagation and Weight Updation

We calculate the total error at the output nodes and propagate these errors back through the network using Backpropagation to calculate the gradients. Then we use an optimization method such as Gradient Descent to ‘adjust’ all weights in the network with an aim of reducing the error at the output layer. This is shown in the Figure 6 below (ignore the mathematical equations in the figure for now).

Suppose that the new weights associated with the node in consideration are w4, w5 and w6 (after Backpropagation and adjusting weights).

Screen Shot 2016-08-09 at 11.53.06 PM.png

Figure 6: backward propagation and weight updation step in a multi layer perceptron

If we now input the same example to the network again, the network should perform better than before since the weights have now been adjusted to minimize the error in prediction. As shown in Figure 7, the errors at the output nodes now reduce to [0.2, -0.2] as compared to [0.6, -0.4] earlier. This means that our network has learnt to correctly classify our first training example.

Screen Shot 2016-08-09 at 11.53.15 PM.png

Figure 7: the MLP network now performs better on the same input

We repeat this process with all other training examples in our dataset. Then, our network is said to have learnt those examples.

If we now want to predict whether a student studying 25 hours and having 70 marks in the mid term will pass the final term, we go through the forward propagation step and find the output probabilities for Pass and Fail.

I have avoided mathematical equations and explanation of concepts such as ‘Gradient Descent’ here and have rather tried to develop an intuition for the algorithm. For a more mathematically involved discussion of the Backpropagation algorithm, refer to this link.

3d Visualization of a Multi Layer Perceptron

Adam Harley has created a 3d visualization of a Multi Layer Perceptron which has already been trained (using Backpropagation) on the MNIST Database of handwritten digits.

The network takes 784 numeric pixel values as inputs from a 28 x 28 image of a handwritten digit (it has 784 nodes in the Input Layer corresponding to pixels). The network has 300 nodes in the first hidden layer, 100 nodes in the second hidden layer, and 10 nodes in the output layer (corresponding to the 10 digits) [15].

Although the network described here is much larger (uses more hidden layers and nodes) compared to the one we discussed in the previous section, all computations in the forward propagation step and backpropagation step are done in the same way (at each node) as discussed before.

Figure 8 shows the network when the input is the digit ‘5’.

Screen Shot 2016-08-09 at 5.45.34 PM.png

Figure 8: visualizing the network for an input of ‘5’

A node which has a higher output value than others is represented by a brighter color. In the Input layer, the bright nodes are those which receive higher numerical pixel values as input. Notice how in the output layer, the only bright node corresponds to the digit 5 (it has an output probability of 1, which is higher than the other nine nodes which have an output probability of 0). This indicates that the MLP has correctly classified the input digit. I highly recommend playing around with this visualization and observing connections between nodes of different layers.

Deep Neural Networks

  1. What is the difference between deep learning and usual machine learning?
  2. What is the difference between a neural network and a deep neural network?
  3. How is deep learning different from multilayer perceptron?

Conclusion

I have skipped important details of some of the concepts discussed in this post to facilitate understanding. I would recommend going through Part1Part2Part3 and Case Study from Stanford’s Neural Network tutorial for a thorough understanding of Multi Layer Perceptrons.

Let me know in the comments below if you have any questions or suggestions!

References

  1. Artificial Neuron Models
  2. Neural Networks Part 1: Setting up the Architecture (Stanford CNN Tutorial)
  3. Wikipedia article on Feed Forward Neural Network
  4. Wikipedia article on Perceptron 
  5. Single-layer Neural Networks (Perceptrons) 
  6. Single Layer Perceptrons 
  7. Weighted Networks – The Perceptron
  8. Neural network models (supervised) (scikit learn documentation)
  9. What does the hidden layer in a neural network compute?
  10. How to choose the number of hidden layers and nodes in a feedforward neural network? 
  11. Crash Introduction to Artificial Neural Networks
  12. Why the BIAS is necessary in ANN? Should we have separate BIAS for each layer?
  13. Basic Neural Network Tutorial – Theory
  14. Neural Networks Demystified (Video Series): Part 1, Welch Labs @ MLconf SF
  15. A. W. Harley, “An Interactive Node-Link Visualization of Convolutional Neural Networks,” in ISVC, pages 867-877, 2015 (link)

 

下面是谷歌翻译后的结果,仅可供大致阅读:

 

人工神经网络(ANN)是一种计算模型,受人脑中生物神经网络处理信息的方式的启发。得益于语音识别,计算机视觉和文本处理方面的许多突破性成果,人工​​神经网络在机器学习研究和行业中引起了很多兴奋。在此博客文章中,我们将尝试加深对一种称为多层感知器的人工神经网络的理解。

单神经元

神经网络中计算的基本单位是神经元,通常称为节点单位。它从其他一些节点或外部源接收输入,并计算输出。每个输入都有一个关联的权重  (w),该权重是根据其相对于其他输入的相对重要性来分配的。节点将函数  f(定义如下)应用于其输入的加权和,如下图1所示:

屏幕截图2016-08-09 at 3.42.21 AM.png

图1:单个神经元

上面的网络采用数字输入X1X2,  并具有 与那些输入关联的权重  w1w2。此外,还有另一个 与权重b相关联的输入1  (称为Bias)。稍后,我们将详细了解偏差的作用。

神经元的输出Y的计算如图1所示。函数f是非线性的,称为激活函数。激活函数的目的是将非线性引入神经元的输出中。这很重要,因为大多数现实世界的数据都是非线性的,我们希望神经元  学习这些 非线性的表示形式。

每个激活函数(或非线性函数)取一个数字并对其执行某个固定的数学运算[2]。在实践中可能会遇到几种激活功能:

  • Sigmoid: 采用实值输入并将其压缩为0到1之间的范围

σ(x)= 1 /(1 + exp(-x))

  • tanh:获取实值输入并将其压缩到范围[-1,1]

tanh(x)=2σ(2x)− 1

  • ReLU:ReLU代表整流线性单位。它采用实值输入并将其阈值设置为零(将负值替换为零)

f(x)=最大值(0,x)

下图  [2]   显示了上述每个激活功能。

屏幕截图2016年8月8日上午11.53.41图2:不同的激活功能

偏差的重要性:偏差的主要功能是为每个节点提供可训练的常数值(除了该节点接收的常规输入外)。请参阅此链接,以了解有关偏倚在神经元中的作用的更多信息。

前馈神经网络

前馈神经网络是设计的第一种也是最简单的人工神经网络[3]。它包含分层排列的多个神经元(节点)。来自相邻层的节点之间具有连接边缘。所有这些连接都具有与之关联的权重

前馈神经网络的示例如图3所示。

屏幕截图2016-08-09 at 4.19.50 AM.png

图3:前馈神经网络的示例

前馈神经网络可以由三种类型的节点组成:

  1. 输入节点–  输入节点将外界的信息提供给网络,并且一起称为“输入层”。在任何Input节点中均不执行任何计算-它们只是将信息传递给隐藏节点。
  2. 隐藏节点– 隐藏节点与外界没有直接连接(因此称为“隐藏”)。它们执行计算并将信息从输入节点传输到输出节点。隐藏节点的集合形成“隐藏层”。虽然前馈网络将只有一个输入层和一个输出层,但它可以有零个或多个隐藏层。
  3. 输出节点– 输出节点统称为“输出层”,负责计算并将信息从网络传输到外界。

在前馈网络中,信息仅在一个方向上向前移动,即从输入节点到隐藏节点(如果有)再到输出节点。网络中没有循环或回路[3]  (前馈网络的此属性不同于递归神经网络,在递归神经网络中,节点之间的连接形成一个循环)。

前馈网络的两个示例如下:

  1. 单层感知器  –这是最简单的前馈神经网络[4],不包含任何隐藏层。您可以在[4],  [5],[6]和 [7]中了解有关单层感知器的更多信息。
  2. 多层感知器  –多层感知器具有一个或多个隐藏层。由于在当今的实际应用中,多层感知器比单层感知器更有用,因此我们仅在下面讨论多层感知器。

多层感知器

多层感知器(MLP)包含一个或多个隐藏层(一个输入层和一个输出层除外)。虽然单层感知器只能学习线性函数,但是多层感知器也可以学习非线性函数。

图4显示了具有单个隐藏层的多层感知器。请注意,所有连接都具有与之关联的权重,但是图中仅显示了三个权重(w0,w1,w2)。

输入层:输入层具有三个节点。偏置节点的值为1。其他两个节点将X1和X2作为外部输入(其数值取决于输入数据集)。如上所述,在输入层中不执行任何计算,因此输入层中节点的输出分别为1,X1和X2,它们被馈送到隐藏层。

隐藏层: 隐藏层还具有三个节点,且“偏移”节点的输出为1。“隐藏”层中其他两个节点的输出取决于输入层(1,X1,X2)以及输出层的输出。与连接(边缘)关联的权重。图4显示了隐藏节点之一(突出显示)的输出计算。同样,可以计算其他隐藏节点的输出。请记住,  f是指激活功能。然后将这些输出馈送到“输出”层中的节点。

ds.png

图4:具有一个隐藏层的多层感知器

输出层: 输出层有两个节点,它们从“隐藏”层获取输入,并执行与突出显示的“隐藏”节点所示的类似计算。这些计算结果计算出的值(Y1和Y2)充当多层感知器的输出。

给定一组特征X =(x1,x2,…) 和目标y,多层感知器可以学习特征与目标之间的关系,以进行分类或回归。

让我们举个例子来更好地理解多层感知器。假设我们有以下学生评分数据集:

train.png

两个输入列显示学生学习的小时数和学生获得的期中成绩。最终结果列可以有两个值1或0,指示学生是否通过了最后一个学期。例如,我们可以看到,如果学生学习了35个小时,并且在期中考试中获得了67分,那么他/她最终将通过期末考试。

现在,假设我们要预测一个学习25小时并在期中获得70分的学生是否会通过期末考试。

test.png

这是一个二进制分类问题,多层感知器可以从给定的示例中学习(训练数据),并在给定新数据点的情况下做出明智的预测。我们将在下面看到多层感知器如何学习这种关系。

训练我们的MLP:反向传播算法

多层感知器学习的过程称为反向传播算法。我建议阅读Hemanth Kumar对此Quora的回答  (在下面引用),该回答清楚地解释了反向传播。

错误的向后传播(通常缩写为BackProp)是可以训练人工神经网络(ANN)的几种方法之一。这是一种有监督的培训计划,这意味着它可以从标记的培训数据中学习(有主管来指导其学习)。

简单来说,BackProp就像“ 从错误中学习 ”。发生错误时,主管会  纠正 ANN。

一个人工神经网络由不同层的节点组成。输入层,中间隐藏层和输出层。相邻层的节点之间的连接具有与之关联的“权重”。学习的目标是为这些边缘分配正确的权重。给定输入向量,这些权重确定什么是输出向量。

在监督学习中,训练集被标记。这意味着,对于某些给定的输入,我们知道所需/预期的输出(标签)。

反向传播算法:
最初,所有边缘权重都是随机分配的。对于训练数据集中的每个输入,将激活ANN并观察其输出。将该输出与我们已经知道的所需输出进行比较,并将错误“传播”回上一层。记录该错误,并相应地“调整”权重。重复该过程,直到输出误差低于预定阈值为止。

一旦上述算法终止,我们就会拥有一个“学习过的”人工神经网络,我们认为它可以使用“新”输入。据说该ANN是从几个示例(标记数据)和错误(错误传播)中学到的。

现在我们有了反向传播的工作原理,让我们回到上面显示的学生评分数据集。

图5所示的多层感知器(改编自Sebastian Raschka 对反向传播算法的  出色视觉解释)在输入层中有两个节点(除了Bias节点),它们分别接受输入“小时学习”和“中期标记”。它还具有一个包含两个节点的隐藏层(除“偏置”节点之外)。输出层也有两个节点-较高的节点输出“通过”的概率,而较低的节点输出“失败”的概率。

在分类任务中,我们通常 在多层感知器的输出层中使用Softmax函数作为激活函数,以确保输出是概率且它们加起来为1。Softmax函数采用任意实值得分的向量,并且将其压缩为一个介于零和一之间且相加为一的值的向量。所以在这种情况下

概率(通过)+概率(失败)= 1

步骤1:正向传播

网络中的所有权重都是随机分配的。让我们考虑 下面图5中标记为V的隐藏层节点。假设从输入到该节点的连接权重为w1,w2和w3(如图所示)。

然后,网络将第一个训练示例作为输入(我们知道对于输入35和67,通过的概率为1)。

  • 输入网络= [35,67]
  • 网络所需的输出(目标)= [1,0]

然后,可以按如下方式计算所考虑节点的输出V(f是激活函数,例如S型):

V =  (1 * w1 + 35 * w2 + 67 * w3)

类似地,还计算了隐藏层中另一个节点的输出。隐藏层中两个节点的输出充当输出层中两个节点的输入。这使我们能够计算输出层中两个节点的输出概率。

假设输出层中两个节点的输出概率分别为0.4和0.6(由于权重是随机分配的,因此输出也将是随机的)。我们可以看到,计算出的概率(0.4和0.6)与期望的概率(分别为1和0)相距甚远,因此,图5中的网络被称为“输出不正确”。

屏幕截图2016-08-09 at 11.52.57 PM.png

图5:多层感知器中的正向传播步骤

步骤2:反向传播和体重增加

我们计算输出节点处的总误差,并使用反向传播计算梯度将这些误差传播回网络。然后,我们使用一种优化方法诸如梯度下降到“调节”  的所有网络中的权重与减小误差在输出层的目的。如下图6所示(现在忽略该图中的数学方程式)。

假设与所考虑的节点关联的新权重为w4,w5和w6(在反向传播和调整权重之后)。

屏幕截图2016-08-09 at 11.53.06 PM.png

图6:多层感知器中的向后传播和权重更新步骤

如果我们现在再次向网络输入相同的示例,则网络应比以前更好,因为现在已经调整了权重以最小化预测误差。如图7所示,与之前的[0.6,-0.4]相比,现在输出节点上的错误减少到[0.2,-0.2]。这意味着我们的网络已学会正确分类我们的第一个培训示例。

屏幕截图2016-08-09 at 11.53.15 PM.png

图7:MLP网络现在在相同的输入上表现更好

我们用数据集中的所有其他训练示例重复此过程。然后,据说我们的网络已经学习了这些示例。

如果现在我们要预测一个学习25学时并在学期中获得70分的学生是否会通过最后一个学期,那么我们将进行正向传播步骤,并找出合格与不合格的输出概率。

我在这里避免了数学方程式和概念的解释,例如“梯度下降”,而是尝试为算法开发一种直觉。有关反向传播算法的更多数学讨论,请参考  此链接

多层感知器的3d可视化

亚当·哈雷(Adam Harley)创建了多层感知器的  3D可视化,该感知器已经在MNIST手写数字数据库上进行了训练(使用反向传播)。

网络从一个手写数字的28 x 28图像中获取784个数字像素值作为输入(在输入层中有784个节点与像素相对应)。该网络在第一个隐藏层中有300个节点,在第二个隐藏层中有100个节点,在输出层中有10个节点(对应于10位数字)[15]。

尽管与上一节中讨论的网络相比,此处描述的网络要大得多(使用更多的隐藏层和节点),但是前向传播步骤和反向传播步骤中的所有计算都以与讨论的相同方式(在每个节点处)进行之前。

图8显示了输入为数字“ 5”时的网络。

屏幕截图2016年8月9日下午5.45.34

图8:可视化网络以输入“ 5”

输出值比其他节点高的节点用较亮的颜色表示。在输入层中,亮节点是那些接收较高数字像素值作为输入的节点。请注意,在输出层中,唯一的亮节点如何对应于数字5(它的输出概率为1,这比其他9个节点的输出概率为0)高。这表明MLP已正确分类了输入数字。我强烈建议您使用这种可视化效果,并观察不同层的节点之间的连接。

深度神经网络

  1. 深度学习和普通机器学习之间有什么区别?
  2. 神经网络和深度神经网络有什么区别?
  3. 深度学习与多层感知器有何不同?

结论

为了方便理解,我跳过了本文中讨论的一些概念的重要细节。我会建议通过去第一部分第2部分第三部分案例分析从斯坦福大学的神经网络教程多层感知的全面理解。

如果您有任何疑问或建议,请在下面的评论中告诉我!

参考文献

  1. 人工神经元模型
  2. 神经网络第1部分:建立体系结构(斯坦福CNN教程)
  3. 维基百科有关前馈神经网络的文章
  4. 维基百科关于Perceptron的文章 
  5. 单层神经网络(感知器) 
  6. 单层感知器 
  7. 加权网络–感知器
  8. 神经网络模型(受监督)(scikit学习文档)
  9. 神经网络中的隐藏层计算什么?
  10. 如何选择前馈神经网络中隐藏层和节点的数量? 
  11. 人工神经网络崩溃简介
  12. 为什么在人工神经网络中需要BIAS?我们是否应该为每个层分别设置BIAS?
  13. 基础神经网络教程-理论
  14. 神经网络揭开神秘面纱(视频系列):第1部分,Welch Labs @ MLconf SF
  15. AW Harley,“卷积神经网络的交互式节点链接可视化”,在ISVC中,第867-877页,2015年(链接

 

转载于ujjwalkarn,文章链接:https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/

点赞

发表评论