0. 文章介紹了什么介紹了神經(jīng)網(wǎng)絡(luò)的基礎(chǔ)單元--神經(jīng)元neurons, 在神經(jīng)元中使用了常見的激活函數(shù): sigmoid 神經(jīng)網(wǎng)絡(luò)中的神經(jīng)元是如何連接和交互的 創(chuàng)建了一個包含身高和體重(特征)作為輸入和性別作為輸出(標簽)的數(shù)據(jù)集,或者說是訓練集 介紹了損失函數(shù)loss functions 和均方誤差損失 mean squared error (MSE) loss. 訓練一個網(wǎng)絡(luò)就等同于最小化損失 使用反向傳播backpropagation計算偏導數(shù) 使用隨機梯度下降 stochastic gradient descent (SGD) 訓練網(wǎng)絡(luò).
完整代碼: import numpy as npdef sigmoid(x):# Sigmoid activation function: f(x) = 1 / (1 + e^(-x))return 1 / (1 + np.exp(-x))def deriv_sigmoid(x):# Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))fx = sigmoid(x)return fx * (1 - fx)def mse_loss(y_true, y_pred):# y_true and y_pred are numpy arrays of the same length.return ((y_true - y_pred) ** 2).mean()class OurNeuralNetwork:'''A neural network with:- 2 inputs- a hidden layer with 2 neurons (h1, h2)- an output layer with 1 neuron (o1)*** DISCLAIMER ***:The code below is intended to be simple and educational, NOT optimal.Real neural net code looks nothing like this. DO NOT use this code.Instead, read/run it to understand how this specific network works.'''def __init__(self):# Weightsself.w1 = np.random.normal()self.w2 = np.random.normal()self.w3 = np.random.normal()self.w4 = np.random.normal()self.w5 = np.random.normal()self.w6 = np.random.normal()# Biasesself.b1 = np.random.normal()self.b2 = np.random.normal()self.b3 = np.random.normal()def feedforward(self, x):# x is a numpy array with 2 elements.h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)return o1def train(self, data, all_y_trues):'''- data is a (n x 2) numpy array, n = # of samples in the dataset.- all_y_trues is a numpy array with n elements.Elements in all_y_trues correspond to those in data.'''learn_rate = 0.1epochs = 1000 # number of times to loop through the entire datasetfor epoch in range(epochs):for x, y_true in zip(data, all_y_trues):# --- Do a feedforward (we'll need these values later)sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1h1 = sigmoid(sum_h1)sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2h2 = sigmoid(sum_h2)sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3o1 = sigmoid(sum_o1)y_pred = o1# --- Calculate partial derivatives.# --- Naming: p_L_p_w1 stands for "partial L partial w1"p_L_p_ypred = -2 * (y_true - y_pred)# Neuron o1p_ypred_p_w5 = h1 * deriv_sigmoid(sum_o1)p_ypred_p_w6 = h2 * deriv_sigmoid(sum_o1)p_ypred_p_b3 = deriv_sigmoid(sum_o1)p_ypred_p_h1 = self.w5 * deriv_sigmoid(sum_o1)p_ypred_p_h2 = self.w6 * deriv_sigmoid(sum_o1)# Neuron h1p_h1_p_w1 = x[0] * deriv_sigmoid(sum_h1)p_h1_p_w2 = x[1] * deriv_sigmoid(sum_h1)p_h1_p_b1 = deriv_sigmoid(sum_h1)# Neuron h2p_h2_p_w3 = x[0] * deriv_sigmoid(sum_h2)p_h2_p_w4 = x[1] * deriv_sigmoid(sum_h2)p_h2_p_b2 = deriv_sigmoid(sum_h2)# --- Update weights and biases# Neuron h1self.w1 -= learn_rate * p_L_p_ypred * p_ypred_p_h1 * p_h1_p_w1self.w2 -= learn_rate * p_L_p_ypred * p_ypred_p_h1 * p_h1_p_w2self.b1 -= learn_rate * p_L_p_ypred * p_ypred_p_h1 * p_h1_p_b1# Neuron h2self.w3 -= learn_rate * p_L_p_ypred * p_ypred_p_h2 * p_h2_p_w3self.w4 -= learn_rate * p_L_p_ypred * p_ypred_p_h2 * p_h2_p_w4self.b2 -= learn_rate * p_L_p_ypred * p_ypred_p_h2 * p_h2_p_b2# Neuron o1self.w5 -= learn_rate * p_L_p_ypred * p_ypred_p_w5self.w6 -= learn_rate * p_L_p_ypred * p_ypred_p_w6self.b3 -= learn_rate * p_L_p_ypred * p_ypred_p_b3# --- Calculate total loss at the end of each epochif epoch % 10 == 0:y_preds = np.apply_along_axis(self.feedforward, 1, data)loss = mse_loss(all_y_trues, y_preds)print("Epoch %d loss: %.3f" % (epoch, loss))# Define datasetdata = np.array([[-2, -1], # Alice[25, 6], # Bob[17, 4], # Charlie[-15, -6], # Diana])all_y_trues = np.array([1, # Alice0, # Bob0, # Charlie1, # Diana])# Train our neural network!network = OurNeuralNetwork()network.train(data, all_y_trues)# Make some predictionsemily = np.array([-7, -3]) # 128 pounds, 63 inchesfrank = np.array([20, 2]) # 155 pounds, 68 inchesprint("Emily: %.3f" % network.feedforward(emily)) # 0.951 - Fprint("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M 一. 基本模塊--神經(jīng)元在說神經(jīng)網(wǎng)絡(luò)之前,我們討論一下神經(jīng)元(Neurons),它是神經(jīng)網(wǎng)絡(luò)的基本單元。神經(jīng)元先獲得輸入,然后執(zhí)行某些數(shù)學運算后,再產(chǎn)生一個輸出。比如一個2輸入神經(jīng)元的例子: 1.單個神經(jīng)元的工作原理在這個神經(jīng)元中,輸入總共經(jīng)歷了3步數(shù)學運算,簡略地說就是:輸入-->與權(quán)重向量相乘,乘后的結(jié)果相加,加后的結(jié)果作為激活函數(shù)的輸入進行計算-->輸出 (1)先將兩個輸入乘以權(quán)重w(weight)(圖中棕色方塊): x1→x1 × w1x2→x2 × w2 (2)把兩個結(jié)果想加,再加上一個偏置b(bias)(圖中綠色方塊): (x1 × w1)+(x2 × w2)+ b (3)最后將它們經(jīng)過激活函數(shù)f(activation function)處理得到輸出(圖中黃色方塊): y = f(x1 × w1 + x2 × w2 + b) 激活函數(shù)的作用是將無限制的輸入轉(zhuǎn)換為可預(yù)測形式的輸出。 2.一種常用的激活函數(shù):sigmoid函數(shù)sigmoid函數(shù)的輸出介于0和1,我們可以理解為它把 (?∞,+∞) 范圍內(nèi)的數(shù)壓縮到 (0, 1)以內(nèi)。正值越大輸出越接近1,負向數(shù)值越大輸出越接近0。 舉個例子,上面神經(jīng)元里的權(quán)重和偏置取如下數(shù)值: w=[0,1]b = 4 w=[0,1]是w1=0、w2=1的向量形式寫法。給神經(jīng)元一個輸入x=[2,3],可以用向量點積的形式把神經(jīng)元的輸出計算出來: w·x+b =(x1 × w1)+(x2 × w2)+ b = 0×2+1×3+4=7y=f(w?X+b)=f(7)=0.999 上述單個神經(jīng)元計算對應(yīng)的python代碼: import numpy as np #調(diào)用強大的Python數(shù)學函數(shù)庫NumPydef sigmoid(x):return 1 / (1 + np.exp(-x)) # 激活函數(shù): f(x) = 1 / (1 + e^(-x))class Neuron:#定義神經(jīng)元def __init__(self, weights, bias):#神經(jīng)元包含2個變量,權(quán)重和偏置self.weights = weightsself.bias = biasdef feedforward(self, inputs):# 輸入乘以權(quán)重,然后加上偏置,然后作為輸入傳給激活函數(shù)進行計算total = np.dot(self.weights, inputs) + self.biasreturn sigmoid(total)weights = np.array([0, 1]) # 示例定義權(quán)重w1 = 0, w2 = 1bias = 4 # 示例定義偏置b = 4n = Neuron(weights, bias)#定義一個神經(jīng)元,并進行初始化x = np.array([2, 3]) #定義輸入 x1 = 2, x2 = 3print(n.feedforward(x)) # 該神經(jīng)元對該輸入的結(jié)果0.9990889488055994 二. 神經(jīng)網(wǎng)絡(luò)神經(jīng)網(wǎng)絡(luò)就是把一堆神經(jīng)元連接在一起,下面是一個神經(jīng)網(wǎng)絡(luò)的簡單舉例: 這個網(wǎng)絡(luò)有2個輸入(x1和x2)、一個包含2個神經(jīng)元的隱藏層(h1和h2)、包含1個神經(jīng)元的輸出層o1。 隱藏層是夾在輸入輸入層和輸出層之間的部分,一個神經(jīng)網(wǎng)絡(luò)可以有多個隱藏層。 把神經(jīng)元的輸入向前傳遞獲得輸出的過程稱為前饋(feedforward)。 我們假設(shè)上面的網(wǎng)絡(luò)里所有神經(jīng)元都具有相同(實際上每條邊的權(quán)重可能都不一樣)的權(quán)重w=[0,1]和偏置b=0,激活函數(shù)都是sigmoid,那么我們會得到什么輸出呢? h1=h2=f(w?x+b)=f((0×2)+(1×3)+0)=f(3)=0.9526o1=f(w?[h1,h2]+b)=f((0?h1)+(1?h2)+0)=f(0.9526)=0.7216 import numpy as np# ... code from previous section hereclass OurNeuralNetwork:#神經(jīng)網(wǎng)絡(luò)'''#注釋說明A neural network with:- 2 inputs- a hidden layer with 2 neurons (h1, h2)- an output layer with 1 neuron (o1)Each neuron has the same weights and bias:- w = [0, 1]- b = 0'''def __init__(self):weights = np.array([0, 1])bias = 0# The Neuron class here is from the previous sectionself.h1 = Neuron(weights, bias)self.h2 = Neuron(weights, bias)self.o1 = Neuron(weights, bias)def feedforward(self, x):out_h1 = self.h1.feedforward(x)out_h2 = self.h2.feedforward(x)# The inputs for o1 are the outputs from h1 and h2out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))return out_o1network = OurNeuralNetwork()x = np.array([2, 3])print(network.feedforward(x)) # 0.7216325609518421 三. 訓練神經(jīng)網(wǎng)絡(luò)現(xiàn)在我們已經(jīng)學會了如何搭建神經(jīng)網(wǎng)絡(luò),現(xiàn)在我們來學習如何訓練它,其實這就是一個優(yōu)化的過程。 假設(shè)有一個數(shù)據(jù)集,包含4個人的身高、體重和性別: 現(xiàn)在我們的目標是訓練一個網(wǎng)絡(luò),根據(jù)體重和身高來推測某人的性別。 為了簡便起見,我們將每個人的身高、體重減去一個固定數(shù)值,把性別男定義為1、性別女定義為0。 在訓練神經(jīng)網(wǎng)絡(luò)之前,我們需要有一個標準定義它到底好不好,以便我們進行改進,這就是損失(loss)。 比如用均方誤差(MSE)來定義損失: n是樣本的數(shù)量,在上面的數(shù)據(jù)集中是4; y代表人的性別,男性是1,女性是0; y_true是變量的真實值,y_pred是變量的預(yù)測值。 顧名思義,均方誤差就是所有數(shù)據(jù)方差的平均值,我們不妨就把它定義為損失函數(shù)。預(yù)測結(jié)果越好,損失就越低,訓練神經(jīng)網(wǎng)絡(luò)就是將損失最小化。 如果上面網(wǎng)絡(luò)的輸出一直是0,也就是預(yù)測所有人都是男性,那么損失是: 計算損失函數(shù)的代碼如下: import numpy as npdef mse_loss(y_true, y_pred):# y_true and y_pred are numpy arrays of the same length.return ((y_true - y_pred) ** 2).mean()y_true = np.array([1, 0, 0, 1])y_pred = np.array([0, 0, 0, 0])print(mse_loss(y_true, y_pred)) # 0.5 四. 訓練神經(jīng)網(wǎng)絡(luò)(二)--減少神經(jīng)網(wǎng)絡(luò)損失這個神經(jīng)網(wǎng)絡(luò)不夠好,還要不斷優(yōu)化,盡量減少損失。我們知道,改變網(wǎng)絡(luò)的權(quán)重和偏置可以影響預(yù)測值,但我們應(yīng)該怎么做呢? 為了簡單起見,我們把數(shù)據(jù)集縮減到只包含Alice一個人的數(shù)據(jù)。 于是損失函數(shù)就剩下Alice一個人的方差: 預(yù)測值是由一系列網(wǎng)絡(luò)權(quán)重和偏置計算出來的: 所以損失函數(shù)實際上是包含多個權(quán)重、偏置的多元函數(shù): (注意!前方高能!需要你有一些基本的多元函數(shù)微分知識,比如偏導數(shù)、鏈式求導法則。) 1. 示例:如果調(diào)整一下w1,損失函數(shù)是會變大還是變???我們需要知道偏導數(shù)?L/?w1是正是負才能回答這個問題。 根據(jù)鏈式求導法則: 而L=(1-y_pred)^2,可以求得第一項偏導數(shù): 接下來我們要想辦法獲得y_pred和w1的關(guān)系,我們已經(jīng)知道神經(jīng)元h1、h2和o1的數(shù)學運算規(guī)則: 實際上只有神經(jīng)元h1中包含權(quán)重w1,所以我們再次運用鏈式求導法則: 然后求?h1/?w1: 我們在上面的計算中遇到了2次激活函數(shù)sigmoid的導數(shù)f′(x),sigmoid函數(shù)的導數(shù)很容易求得: 總的鏈式求導公式: 這種向后計算偏導數(shù)的系統(tǒng)稱為反向傳播(backpropagation)。 上面的數(shù)學符號太多,下面我們帶入實際數(shù)值來計算一下。h1、h2和o1 h1=f(x1?w1+x2?w2+b1)=0.0474h2=f(w3?x3+w4?x4+b2)=0.0474o1=f(w5?h1+w6?h2+b3)=f(0.0474+0.0474+0)=f(0.0948)=0.524 神經(jīng)網(wǎng)絡(luò)的輸出y=0.524,沒有顯示出強烈的是男(1)是女(0)的證據(jù)?,F(xiàn)在的預(yù)測效果還很不好。 我們再計算一下當前網(wǎng)絡(luò)的偏導數(shù)?L/?w1: 這個結(jié)果告訴我們:如果增大w1,損失函數(shù)L會有一個非常小的增長。 2.隨機梯度下降下面將使用一種稱為隨機梯度下降(SGD)的優(yōu)化算法,來訓練網(wǎng)絡(luò)。 經(jīng)過前面的運算,我們已經(jīng)有了訓練神經(jīng)網(wǎng)絡(luò)所有數(shù)據(jù)。但是該如何操作?SGD定義了改變權(quán)重和偏置的方法: η是一個常數(shù),稱為學習率(learning rate),它決定了我們訓練網(wǎng)絡(luò)速率的快慢。將w1減去η·?L/?w1,就等到了新的權(quán)重w1。 當?L/?w1是正數(shù)時,w1會變?。划?L/?w1是負數(shù) 時,w1會變大。 如果我們用這種方法去逐步改變網(wǎng)絡(luò)的權(quán)重w和偏置b,損失函數(shù)會緩慢地降低,從而改進我們的神經(jīng)網(wǎng)絡(luò)。 訓練流程如下: (1)從數(shù)據(jù)集中選擇一個樣本; (2)計算損失函數(shù)對所有權(quán)重和偏置的偏導數(shù); (3)使用更新公式更新每個權(quán)重和偏置; (4)回到第1步。 我們用Python代碼實現(xiàn)這個過程: import numpy as npdef sigmoid(x):# Sigmoid activation function: f(x) = 1 / (1 + e^(-x))return 1 / (1 + np.exp(-x))def deriv_sigmoid(x):# Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))fx = sigmoid(x)return fx * (1 - fx)def mse_loss(y_true, y_pred):# y_true and y_pred are numpy arrays of the same length.return ((y_true - y_pred) ** 2).mean()class OurNeuralNetwork:'''A neural network with:- 2 inputs- a hidden layer with 2 neurons (h1, h2)- an output layer with 1 neuron (o1)*** DISCLAIMER ***:The code below is intended to be simple and educational, NOT optimal.Real neural net code looks nothing like this. DO NOT use this code.Instead, read/run it to understand how this specific network works.'''def __init__(self):# Weightsself.w1 = np.random.normal()self.w2 = np.random.normal()self.w3 = np.random.normal()self.w4 = np.random.normal()self.w5 = np.random.normal()self.w6 = np.random.normal()# Biasesself.b1 = np.random.normal()self.b2 = np.random.normal()self.b3 = np.random.normal()def feedforward(self, x):# x is a numpy array with 2 elements.h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)return o1def train(self, data, all_y_trues):'''- data is a (n x 2) numpy array, n = # of samples in the dataset.- all_y_trues is a numpy array with n elements.Elements in all_y_trues correspond to those in data.'''learn_rate = 0.1epochs = 1000 # number of times to loop through the entire datasetfor epoch in range(epochs):for x, y_true in zip(data, all_y_trues):# --- Do a feedforward (we'll need these values later)sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1h1 = sigmoid(sum_h1)sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2h2 = sigmoid(sum_h2)sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3o1 = sigmoid(sum_o1)y_pred = o1# --- Calculate partial derivatives.# --- Naming: d_L_d_w1 represents "partial L / partial w1"d_L_d_ypred = -2 * (y_true - y_pred)# Neuron o1d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)d_ypred_d_b3 = deriv_sigmoid(sum_o1)d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)# Neuron h1d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)d_h1_d_b1 = deriv_sigmoid(sum_h1)# Neuron h2d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)d_h2_d_b2 = deriv_sigmoid(sum_h2)# --- Update weights and biases# Neuron h1self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1# Neuron h2self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2# Neuron o1self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3# --- Calculate total loss at the end of each epochif epoch % 10 == 0:y_preds = np.apply_along_axis(self.feedforward, 1, data)loss = mse_loss(all_y_trues, y_preds)print("Epoch %d loss: %.3f" % (epoch, loss))# Define datasetdata = np.array([[-2, -1], # Alice[25, 6], # Bob[17, 4], # Charlie[-15, -6], # Diana])all_y_trues = np.array([1, # Alice0, # Bob0, # Charlie1, # Diana])# Train our neural network!network = OurNeuralNetwork()network.train(data, all_y_trues) 隨著學習過程的進行,損失函數(shù)逐漸減小。 現(xiàn)在我們可以用它來推測出每個人的性別了: # Make some predictionsemily = np.array([-7, -3]) # 128 pounds, 63 inchesfrank = np.array([20, 2]) # 155 pounds, 68 inchesprint("Emily: %.3f" % network.feedforward(emily)) # 0.951 - Fprint("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M 更多 這篇教程只是萬里長征第一步,后面還有很多知識需要學習: 1、用更大更好的機器學習庫搭建神經(jīng)網(wǎng)絡(luò),如Tensorflow、Keras、PyTorch 2、在瀏覽器中的直觀理解神經(jīng)網(wǎng)絡(luò):https://playground./ 3、學習sigmoid以外的其他激活函數(shù):https:///activations/ 4、學習SGD以外的其他優(yōu)化器:https:///optimizers/ 5、學習卷積神經(jīng)網(wǎng)絡(luò)(CNN) 6、學習遞歸神經(jīng)網(wǎng)絡(luò)(RNN) 這些都是Victor給自己挖的“坑”。他表示自己未來“可能”會寫這些主題內(nèi)容,希望他能陸續(xù)把這些坑填完。如果你想入門神經(jīng)網(wǎng)絡(luò),不妨去訂閱他的博客。 關(guān)于這位小哥 Victor Zhou是普林斯頓2019級CS畢業(yè)生,已經(jīng)拿到Facebook軟件工程師的offer,今年8月入職。他曾經(jīng)做過JS編譯器,還做過兩款頁游,一個仇恨攻擊言論的識別庫。 最后附上小哥的博客鏈接: https:/// 參考:https://www.toutiao.com/a6668491982555316750/?tt_from=weixin&utm_campaign=client_share&wxshare_count=1×tamp=1553137961&app=news_article&utm_source=weixin&utm_medium=toutiao_ios&group_id=6668491982555316750 https:///blog/intro-to-neural-networks/
|