Anna-Lena Popkes,德國波恩大學(xué)計(jì)算機(jī)科學(xué)專業(yè)的研究生,主要關(guān)注機(jī)器學(xué)習(xí)和神經(jīng)網(wǎng)絡(luò)。
編譯 | 林椿眄 出品 | 人工智能頭條
導(dǎo)讀:Python 被稱為是最接近 AI 的語言。最近一位名叫Anna-Lena Popkes的小姐姐在GitHub上分享了自己如何使用Python(3.6及以上版本)實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法的筆記,并附有完整代碼。所有這些算法的實(shí)現(xiàn)都沒有使用其他機(jī)器學(xué)習(xí)庫。這份筆記可以幫大家對(duì)算法以及其底層結(jié)構(gòu)有個(gè)基本的了解,但并不是提供最有效的實(shí)現(xiàn)。
Softmax 回歸算法,又稱為多項(xiàng)式或多類別的 Logistic 回歸算法。
給定:
Softmax 回歸模型有以下幾個(gè)特點(diǎn):
對(duì)于每個(gè)類別,都存在一個(gè)獨(dú)立的、實(shí)值加權(quán)向量 這個(gè)權(quán)重向量通常作為權(quán)重矩陣中的行。 對(duì)于每個(gè)類別,都存在一個(gè)獨(dú)立的、實(shí)值偏置量b 它使用 softmax 函數(shù)作為其激活函數(shù) 它使用交叉熵( cross-entropy )作為損失函數(shù)
訓(xùn)練 Softmax 回歸模型有不同步驟。首先(在步驟0中),模型的參數(shù)將被初始化。在達(dá)到指定訓(xùn)練次數(shù)或參數(shù)收斂前,重復(fù)以下其他步驟。
第 0 步:用 0 (或小的隨機(jī)值)來初始化權(quán)重向量和偏置值
第 1 步:對(duì)于每個(gè)類別k,計(jì)算其輸入的特征與權(quán)重值的線性組合,也就是說為每個(gè)類別的訓(xùn)練樣本計(jì)算一個(gè)得分值。對(duì)于類別k,輸入向量為,則得分值的計(jì)算如下:
其中表示類別k的權(quán)重矩陣,·表示點(diǎn)積。
我們可以通過矢量化和矢量傳播法則計(jì)算所有類別及其訓(xùn)練樣本的得分值:
其中 X 是所有訓(xùn)練樣本的維度矩陣,W 表示每個(gè)類別的權(quán)重矩陣維度,其形式為;
第 2 步:用 softmax 函數(shù)作為激活函數(shù),將得分值轉(zhuǎn)化為概率值形式。屬于類別 k 的輸入向量的概率值為:
同樣地,我們可以通過矢量化來對(duì)所有類別同時(shí)處理,得到其概率輸出。模型預(yù)測出的表示的是該類別的最高概率。
第 3 步:計(jì)算整個(gè)訓(xùn)練集的損失值。
我們希望模型預(yù)測出的高概率值是目標(biāo)類別,而低概率值表示其他類別。這可以通過以下的交叉熵?fù)p失函數(shù)來實(shí)現(xiàn):
在上面公式中,目標(biāo)類別標(biāo)簽表示成獨(dú)熱編碼形式( one-hot )。因此為1時(shí)表示的目標(biāo)類別是 k,反之則為 0。
第 4 步:對(duì)權(quán)重向量和偏置量,計(jì)算其對(duì)損失函數(shù)的梯度。
關(guān)于這個(gè)導(dǎo)數(shù)實(shí)現(xiàn)的詳細(xì)解釋,可以參見這里(http://ufldl./tutorial/supervised/SoftmaxRegression/)。
一般形式如下:
對(duì)于偏置量的導(dǎo)數(shù)計(jì)算,此時(shí)為1。
第 5 步:對(duì)每個(gè)類別k,更新其權(quán)重和偏置值。
其中,表示學(xué)習(xí)率。
In [1]:
from sklearn.datasets import load_iris import numpy as np from sklearn.model_selection import train_test_split from sklearn.datasets import make_blobs import matplotlib.pyplot as plt np.random.seed(13)
數(shù)據(jù)集
In [2]:
X, y_true = make_blobs(centers=4, n_samples = 5000) fig = plt.figure(figsize=(8,6)) plt.scatter(X[:,0], X[:,1], c=y_true) plt.title('Dataset') plt.xlabel('First feature') plt.ylabel('Second feature') plt.show()
In [3]:
# reshape targets to get column vector with shape (n_samples, 1) y_true = y_true[:, np.newaxis] # Split the data into a training and test set X_train, X_test, y_train, y_test = train_test_split(X, y_true) print(f'Shape X_train: {X_train.shape}') print(f'Shape y_train: {y_train.shape}') print(f'Shape X_test: {X_test.shape}') print(f'Shape y_test: {y_test.shape}')
Shape X_train: (3750, 2) Shape y_train: (3750, 1) Shape X_test: (1250, 2) Shape y_test: (1250, 1)
Softmax回歸分類
class SoftmaxRegressor: def __init__(self): pass def train(self, X, y_true, n_classes, n_iters=10, learning_rate=0.1): ''' Trains a multinomial logistic regression model on given set of training data ''' self.n_samples, n_features = X.shape self.n_classes = n_classes self.weights = np.random.rand(self.n_classes, n_features) self.bias = np.zeros((1, self.n_classes)) all_losses = [] for i in range(n_iters): scores = self.compute_scores(X) probs = self.softmax(scores) y_predict = np.argmax(probs, axis=1)[:, np.newaxis] y_one_hot = self.one_hot(y_true) loss = self.cross_entropy(y_one_hot, probs) all_losses.append(loss) dw = (1 / self.n_samples) * np.dot(X.T, (probs - y_one_hot)) db = (1 / self.n_samples) * np.sum(probs - y_one_hot, axis=0) self.weights = self.weights - learning_rate * dw.T self.bias = self.bias - learning_rate * db if i % 100 == 0: print(f'Iteration number: {i}, loss: {np.round(loss, 4)}') return self.weights, self.bias, all_losses def predict(self, X): ''' Predict class labels for samples in X. Args: X: numpy array of shape (n_samples, n_features) Returns: numpy array of shape (n_samples, 1) with predicted classes ''' scores = self.compute_scores(X) probs = self.softmax(scores) return np.argmax(probs, axis=1)[:, np.newaxis] def softmax(self, scores): ''' Tranforms matrix of predicted scores to matrix of probabilities Args: scores: numpy array of shape (n_samples, n_classes) with unnormalized scores Returns: softmax: numpy array of shape (n_samples, n_classes) with probabilities ''' exp = np.exp(scores) sum_exp = np.sum(np.exp(scores), axis=1, keepdims=True) softmax = exp / sum_exp return softmax def compute_scores(self, X): ''' Computes class-scores for samples in X Args: X: numpy array of shape (n_samples, n_features) Returns: scores: numpy array of shape (n_samples, n_classes) ''' return np.dot(X, self.weights.T) + self.bias def cross_entropy(self, y_true, scores): loss = - (1 / self.n_samples) * np.sum(y_true * np.log(scores)) return loss def one_hot(self, y): ''' Tranforms vector y of labels to one-hot encoded matrix ''' one_hot = np.zeros((self.n_samples, self.n_classes)) one_hot[np.arange(self.n_samples), y.T] = 1 return one_hot
初始化并訓(xùn)練模型
regressor = SoftmaxRegressor() w_trained, b_trained, loss = regressor.train(X_train, y_train, learning_rate=0.1, n_iters=800, n_classes=4) fig = plt.figure(figsize=(8,6)) plt.plot(np.arange(800), loss) plt.title('Development of loss during training') plt.xlabel('Number of iterations') plt.ylabel('Loss') plt.show()Iteration number: 0, loss: 1.393 Iteration number: 100, loss: 0.2051 Iteration number: 200, loss: 0.1605 Iteration number: 300, loss: 0.1371 Iteration number: 400, loss: 0.121 Iteration number: 500, loss: 0.1087 Iteration number: 600, loss: 0.0989 Iteration number: 700, loss: 0.0909
測試模型
n_test_samples, _ = X_test.shape y_predict = regressor.predict(X_test) print(f'Classification accuracy on test set: {(np.sum(y_predict == y_test)/n_test_samples) * 100}%')
測試集分類準(zhǔn)確率:99.03999999999999%
原文鏈接: https://github.com/zotroneneis/machine_learning_basics
|