用簡單的 2D CNN 進(jìn)行 MNIST 數(shù)字識(shí)別

本文作者： AI研習(xí)社-譯站

2018-07-23 10:04

導(dǎo)語：對(duì)于圖像分類任務(wù)，當(dāng)前最先進(jìn)的架構(gòu)是卷積神經(jīng)網(wǎng)絡(luò) (CNNs)，無論是面部識(shí)別、自動(dòng)駕駛還是目標(biāo)檢測，CNN 都得到廣泛使用。

雷鋒網(wǎng) AI 研習(xí)社按：本文為雷鋒網(wǎng)字幕組編譯的技術(shù)博客，原標(biāo)題 A simple 2D CNN for MNIST digit recognition，作者為 Sambit Mahapatra。

翻譯 | 王祎校對(duì) | 霍雷剛整理 | 孔令雙

對(duì)于圖像分類任務(wù)，當(dāng)前最先進(jìn)的架構(gòu)是卷積神經(jīng)網(wǎng)絡(luò) (CNNs).。無論是面部識(shí)別、自動(dòng)駕駛還是目標(biāo)檢測，CNN 得到廣泛使用。在本文中，針對(duì)著名的 MNIST 數(shù)字識(shí)別任務(wù)，我們?cè)O(shè)計(jì)了一個(gè)以 tensorflow 為后臺(tái)技術(shù)、基于 keras 的簡單 2D 卷積神經(jīng)網(wǎng)絡(luò) (CNN) 模型。整個(gè)工作流程如下:

1. 準(zhǔn)備數(shù)據(jù)

2. 創(chuàng)建模型并編譯

3. 訓(xùn)練模型并評(píng)估

4. 將模型存盤以便下次使用

用簡單的 2D CNN 進(jìn)行 MNIST 數(shù)字識(shí)別

數(shù)據(jù)集就使用上文所提到的 MNIST 數(shù)據(jù)集。MNIST 數(shù)據(jù)集 (Modified National Institute of Standards and Technoloy 數(shù)據(jù)集) 是一個(gè)大型的手寫數(shù)字（0 到 9）數(shù)據(jù)集。該數(shù)據(jù)集包含大小為 28x28 的圖片 7 萬張，其中 6 萬張訓(xùn)練圖片、1 萬張測試圖片。第一步，加載數(shù)據(jù)集，這一步可以很容易地通過 keras api 來實(shí)現(xiàn)。

import keras
from keras.datasets import mnist
#load mnist dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data() #everytime loading data won't be so easy :)

其中，X_train 包含 6 萬張大小為 28x28 的訓(xùn)練圖片，y_train 包含這些圖片對(duì)應(yīng)的標(biāo)簽。與之類似，X_test 包含了 1 萬張大小為 28x28 的測試圖片，y_test 為其對(duì)應(yīng)的標(biāo)簽。我們將一部分訓(xùn)練數(shù)據(jù)可視化一下，來對(duì)深度學(xué)習(xí)模型的目標(biāo)有一個(gè)認(rèn)識(shí)吧。

import matplotlib.pyplot as plt
fig = plt.figure()
for i in range(9):
plt.subplot(3,3,i+1)
plt.tight_layout()
plt.imshow(X_train[i], cmap='gray', interpolation='none')
plt.title("Digit: {}".format(y_train[i]))
plt.xticks([])
plt.yticks([])
fig

用簡單的 2D CNN 進(jìn)行 MNIST 數(shù)字識(shí)別

如上所示，左上角圖為「5」的圖片數(shù)據(jù)被存在 X_train[0] 中，y_train[0] 中存儲(chǔ)其對(duì)應(yīng)的標(biāo)簽「5」。我們的深度學(xué)習(xí)模型應(yīng)該能夠僅僅通過手寫圖片預(yù)測實(shí)際寫下的數(shù)字。現(xiàn)在，為了準(zhǔn)備數(shù)據(jù)，我們需要對(duì)這些圖片做一些諸如調(diào)整大小、像素值歸一化之類的處理。

#reshaping
#this assumes our data format
#For 3D data, "channels_last" assumes (conv_dim1, conv_dim2, conv_dim3, channels) while
#"channels_first" assumes (channels, conv_dim1, conv_dim2, conv_dim3).
if k.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
#more reshaping
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape) #X_train shape: (60000, 28, 28, 1)

對(duì)圖片數(shù)據(jù)做了必要的處理之后，需要將 y_train 和 y_test 標(biāo)簽數(shù)據(jù)進(jìn)行轉(zhuǎn)換，轉(zhuǎn)換成分類的格式。例如，模型構(gòu)建時(shí)，3 應(yīng)該被轉(zhuǎn)換成向量 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]。

import keras
#set number of categories
num_category = 10
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_category)
y_test = keras.utils.to_categorical(y_test, num_category)

創(chuàng)建模型并編譯

數(shù)據(jù)加載進(jìn)模型之后，我們需要定義模型結(jié)構(gòu)，并通過優(yōu)化函數(shù)、損失函數(shù)和性能指標(biāo)。

接下來定義的架構(gòu)為 2 個(gè)卷積層，分別在每個(gè)卷積層后接續(xù)一個(gè)池化層，一個(gè)全連接層和一個(gè) softmax 層。在每一層卷積層上都會(huì)使用多個(gè)濾波器來提取不同類型的特征。直觀的解釋的話，第一個(gè)濾波器有助于檢測圖片中的直線，第二個(gè)濾波器有助于檢測圖片中的圓形，等等。關(guān)于每一層技術(shù)實(shí)現(xiàn)的解釋，將會(huì)在后續(xù)的帖子中進(jìn)行講解。如果想要更好的理解每一層的含義，可以參考 http://cs231n.github.io/convolutional-networks/

在最大池化和全連接層之后，在我們的模型中引入 dropout 來進(jìn)行正則化，用以消除模型的過擬合問題。

##model building
model = Sequential()
#convolutional layer with rectified linear unit activation
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
#32 convolution filters used each of size 3x3
#again
model.add(Conv2D(64, (3, 3), activation='relu'))
#64 convolution filters used each of size 3x3
#choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
#randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
#flatten since too many dimensions, we only want a classification output
model.add(Flatten())
#fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
#one more dropout for convergence' sake :)
model.add(Dropout(0.5))
#output a softmax to squash the matrix into output probabilities
model.add(Dense(num_category, activation='softmax'))

確定模型架構(gòu)之后，模型需要進(jìn)行編譯。由于這是多類別的分類問題，因此我們需要使用 categorical_crossentropy 作為損失函數(shù)。由于所有的標(biāo)簽都帶有相似的權(quán)重，我們更喜歡使用精確度作為性能指標(biāo)。AdaDelta 是一個(gè)很常用的梯度下降方法。我們使用這個(gè)方法來優(yōu)化模型參數(shù)。

#Adaptive learning rate (adaDelta) is a popular form of gradient descent rivaled only by adam and adagrad
#categorical ce since we have multiple classes (10)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

訓(xùn)練模型并評(píng)估

在定義模型架構(gòu)和編譯模型之后，要使用訓(xùn)練集去訓(xùn)練模型，使得模型可以識(shí)別手寫數(shù)字。這里，我們將使用 X_train 和 y_train 來擬合模型。

batch_size = 128
num_epoch = 10
#model training
model_log = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=num_epoch,
verbose=1,
validation_data=(X_test, y_test))

其中，一個(gè) epoch 表示一次全量訓(xùn)練樣例的前向和后向傳播。batch_size 就是在一次前向／后向傳播過程用到的訓(xùn)練樣例的數(shù)量。訓(xùn)練輸出結(jié)果如下：

用簡單的 2D CNN 進(jìn)行 MNIST 數(shù)字識(shí)別

現(xiàn)在，我們來評(píng)估訓(xùn)練得到模型的性能。

score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0]) #Test loss: 0.0296396646054
print('Test accuracy:', score[1]) #Test accuracy: 0.9904

測試準(zhǔn)確率達(dá)到了 99%+，這意味著這個(gè)預(yù)測模型訓(xùn)練的很成功。如果查看整個(gè)訓(xùn)練日志，就會(huì)發(fā)現(xiàn)隨著 epoch 的次數(shù)的增多，模型在訓(xùn)練數(shù)據(jù)和測試數(shù)據(jù)上的損失和準(zhǔn)確率逐漸收斂，最終趨于穩(wěn)定。

import os
# plotting the metrics
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(model_log.history['acc'])
plt.plot(model_log.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')
plt.subplot(2,1,2)
plt.plot(model_log.history['loss'])
plt.plot(model_log.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.tight_layout()
fig

用簡單的 2D CNN 進(jìn)行 MNIST 數(shù)字識(shí)別

將模型存盤以便下次使用

現(xiàn)在需要將訓(xùn)練過的模型進(jìn)行序列化。模型的架構(gòu)或者結(jié)構(gòu)保存在 json 文件，權(quán)重保存在 hdf5 文件。

#Save the model
# serialize model to JSON
model_digit_json = model.to_json()
with open("model_digit.json", "w") as json_file:
json_file.write(model_digit_json)
# serialize weights to HDF5
model.save_weights("model_digit.h5")
print("Saved model to disk")

模型被保存后，可以被重用，也可以很方便地移植到其它環(huán)境中使用。在以后的帖子中，我們將會(huì)演示如何在生產(chǎn)環(huán)境中部署這個(gè)模型。

享受深度學(xué)習(xí)吧！