教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

本文作者：三川

2017-04-28 16:23

導語：我將解釋如何在 TensorFlow 上創(chuàng)建 RNN。

RNN 是什么?

遞歸神經網(wǎng)絡，或者說 RNN，在數(shù)據(jù)能被按次序處理、數(shù)據(jù)點的不同排列亦會產生影響時就可以使用它。更重要的是，該次序可以是任意長度。

最直接的例子大概是一組數(shù)字的時間序列，根據(jù)此前的數(shù)值來預測接下來的數(shù)值。每個時間步（time-step）上，RNN 的輸入是當前數(shù)值以及一個靜態(tài)矢量，后者用來表示神經網(wǎng)絡在此前的不同時間步所“看到”的東西。該靜態(tài)矢量是 RNN 的編碼存儲，初始值設為零。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

RNN 處理系列數(shù)據(jù)的過程圖解

設置

我們會創(chuàng)建一個簡單的 Echo-RNN，它能記住輸入數(shù)據(jù)并在幾個時間步之后與之呼應。首先要設置一些我們需要的限制，它們的意義下面會解釋。

from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 100
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4
num_classes = 2
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length

生成數(shù)據(jù)

現(xiàn)在生成訓練數(shù)據(jù)，輸入在本質上是一個隨機的二元矢量。輸出會是輸入的“回響”（echo），把 echo_step 步驟移到右邊。

def generateData():
x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
y = np.roll(x, echo_step)
y[0:echo_step] = 0

x = x.reshape((batch_size, -1)) # The first index changing slowest, subseries as rows
y = y.reshape((batch_size, -1))

return (x, y)

注意數(shù)據(jù)整形（data reshaping）步驟，這是為了將其裝入有 batch_size 行的矩陣。神經網(wǎng)絡根據(jù)神經元權重來逼近損失函數(shù)的梯度，通過這種方式來進行訓練；該過程只會利用數(shù)據(jù)的一個小子集，即 mini-batch。數(shù)據(jù)整形把整個數(shù)據(jù)集裝入矩陣，然后分割為這些 mini-batch。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！整形后的數(shù)據(jù)矩陣圖解。曲線箭頭用以表示換了行的相鄰時間步。淺灰色代表 0，深灰色代表 1。

創(chuàng)建計算圖

TensorFlow 的工作方式會首先創(chuàng)建一個計算圖，來確認哪些操作需要完成。計算圖的輸入和輸出一般是多維陣列，即張量（tensor）。計算圖或其中一部分，將被迭代執(zhí)行。這既可以在 CPU、GPU，也可在遠程服務器上執(zhí)行。

變量和 placeholder

本教程中使用的兩個最基礎的 TensorFlow 數(shù)據(jù)結構是變量和 placeholder。每輪運行中，batch 數(shù)據(jù)會被喂給 placeholder，而后者是計算圖的“起始點”。另外，前一輪輸出的 RNN-state 會在 placeholder 中提供。

batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

init_state = tf.placeholder(tf.float32, [batch_size, state_size])

神經網(wǎng)絡的權重和偏差，被作為 TensorFlow 變量。這使得它們在每輪運行中保持一致，并對每次 batch 漸進式地更新。

W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)

W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

下圖展示的是作為輸入的數(shù)據(jù)矩陣，現(xiàn)有的 batch——batchX_placeholder 在虛線長方形里。正如我們后來看到的，這一 ”batch 窗口“在每輪運行向右移動了 truncated_backprop_length 規(guī)定的步數(shù)，這便是箭頭的意義。在下面的例子中，batch_size = 3, truncated_backprop_length = 3, and total_series_length = 36。注意這些數(shù)字只是出于可視化目的，代碼中的數(shù)值并不一樣。在幾個數(shù)據(jù)點中，series order 指數(shù)以數(shù)字表示。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

Unpacking

這一步，要做的是搭建計算圖中類似于真正的 RNN 計算的部分。首先，我們希望把 batch 數(shù)據(jù)分割為鄰近的時間步。

# Unpack columns
inputs_series = tf.unpack(batchX_placeholder, axis=1)
labels_series = tf.unpack(batchY_placeholder, axis=1)

如同下圖所示，這通過把 batch 中的列（axis = 1）解壓到 Python 列表來實現(xiàn)。RNN 同時在時間序列的不同部分上訓練；在現(xiàn)有 batch 例子中，是 4-6、16-18、28-30 步。使用以 “plural”_”series”為名的變量，是為了強調該變量是一個列表——代表了在每一個時間步有多個 entry 的時間序列。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

現(xiàn)有 batch 被分成列的圖示，每個數(shù)據(jù)點上的數(shù)字是順序指數(shù)，牽頭指示相鄰時間步。

在我們的時間序列中，訓練同時在三個地方完成。這需要在前饋是時同時保存三個 instances of states。這已經被考慮到了：你看得到的 init_state placeholder 有 batch_size 行。

Forward pass

下一步，我們會創(chuàng)建進行真實 RNN 運算的計算圖部分。

# Forward pass
current_state = init_state
states_series = []
for current_input in inputs_series:
current_input = tf.reshape(current_input, [batch_size, 1])
input_and_state_concatenated = tf.concat(1, [current_input, current_state]) # Increasing number of columns

next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b) # Broadcasted addition
states_series.append(next_state)
current_state = next_state

注意第六行的串聯(lián)（concatenation），我們實際上想要做的，是計算兩個仿射變形（affine transforms）的 current_input * Wa + current_state *Wbin，見下圖。通過串聯(lián)這兩個張量，你會=只會使用一個矩陣乘法。偏差 b 的加法，會在 batch 里的所有樣本上傳播。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

上面代碼示例中矩陣第八行的計算，非線性變形的反正切（arctan）被忽略。

你也許會好奇變量 truncated_backprop_length 其名稱的含義。當一個 RNN 被訓練，事實上它被作為是一個深度神經網(wǎng)絡的特殊情況：在每一層有重復出現(xiàn)的權重。這些層不會展開到一開始的時候，這么干的計算成本太高，因而時間步的數(shù)量被截為有限的數(shù)目。在上面的圖示中，誤差在 batch 中被反向傳播三步。

計算損失

這是計算圖的最后一步，一個從狀態(tài)到輸出的全連接 softmax 層，讓 classes 以 one-hot 格式編碼，然后計算 batch 的損失。

logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

最后一行加入的是訓練功能。TensorFlow 會自動運行反向傳播——對每一個 mini-batch，計算圖會執(zhí)行一次；網(wǎng)絡權重會漸進式更新。

注意 API 調用 ”sparse_softmax_cross_entropy_with_logits“，它在內部自動計算 softmax，然后計算 cross-entropy。在我們的例子里，這些 class 是互相排斥的，要么是 1 要么是 0，這便是使用 “Sparse-softmax” 的原因。你可以在 API 中了解更多。

訓練可視化

這里面有可視化函數(shù)，所以我們能在訓練時看到神經網(wǎng)絡中發(fā)生了什么。它會不斷繪制損失曲線，展示訓練輸入、訓練輸出，以及在一個訓練 batch 的不同樣本序列上神經網(wǎng)絡的現(xiàn)有預測。

def plot(loss_list, predictions_series, batchX, batchY):
plt.subplot(2, 3, 1)
plt.cla()
plt.plot(loss_list)

for batch_series_idx in range(5):
one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

plt.subplot(2, 3, batch_series_idx + 2)
plt.cla()
plt.axis([0, truncated_backprop_length, 0, 2])
left_offset = range(truncated_backprop_length)
plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

plt.draw()
plt.pause(0.0001)

運行訓練環(huán)節(jié)

到了把一切歸總、訓練網(wǎng)絡的時候了。在 TensorFlow 中，計算圖要在一個大環(huán)節(jié)中執(zhí)行。新數(shù)據(jù)在每個小環(huán)節(jié)生成（并不是通常的方式，但它在這個例子中有用。以為所有東西都是可預測的）。

with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
plt.ion()
plt.figure()
plt.show()
loss_list = []

for epoch_idx in range(num_epochs):
x,y = generateData()
_current_state = np.zeros((batch_size, state_size))

print("New data, epoch", epoch_idx)

for batch_idx in range(num_batches):
start_idx = batch_idx * truncated_backprop_length
end_idx = start_idx + truncated_backprop_length

batchX = x[:,start_idx:end_idx]
batchY = y[:,start_idx:end_idx]

_total_loss, _train_step, _current_state, _predictions_series = sess.run(
[total_loss, train_step, current_state, predictions_series],
feed_dict={
batchX_placeholder:batchX,
batchY_placeholder:batchY,
init_state:_current_state
})

loss_list.append(_total_loss)

if batch_idx%100 == 0:
print("Step",batch_idx, "Loss", _total_loss)
plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()

你可以看到，我們在每次迭代把 truncated_backprop_length 步驟向前移（第 15–19 行），但設置不同的移動幅度是可能的。該話題在下面進一步討論。據(jù)雷鋒網(wǎng)了解，這么做的壞處是，truncated_backprop_length 需要比 time dependencies 大很多（在我們的例子中是三步），才能隔離相關訓練數(shù)據(jù)。否則可能會有許多“丟失”，如下圖。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

方塊時間序列，升起的黑塊代表 echo-output，在 echo input（黑塊）三步之后激活。滑動 batch 窗口每次也移動三步，在我們的例子中，這意味著沒有 batch 會隔離 dependency，所以它無法訓練。

雷鋒網(wǎng)提醒，這只是一個解釋 RNN 工作原理的簡單例子，該功能可以很容易地用幾行代碼編寫出來。該神經網(wǎng)絡將能夠準確地學習 echo 行為，所以沒有必要用測試數(shù)據(jù)。

該程序會隨訓練更新圖表。請見下面的圖例。藍條代表訓練輸入信號（二元），紅條表示訓練輸出的 echo，綠條是神經網(wǎng)絡產生的 echo。不同的條形塊代表了當前 batch 的不同樣本序列。

我們的算法能夠相當快速地學習該任務。左上角的圖展示了隨時函數(shù)的輸出，但圖中的尖刺是怎么回事？你可以好好想一想，答案在下面。

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

損失、輸入、輸出訓練數(shù)據(jù)（藍、紅）以及預測（綠）的可視化。

形成尖刺的原因是：我們正在開始一個新的小環(huán)節(jié)，生成新數(shù)據(jù)。由于矩陣被整形過，每一行的新單元與上一行的最后一個單元臨近。除了第一行，所有行的開頭幾個單元有不會被包括在狀態(tài)（state）里的 dependency，因此神經網(wǎng)絡在第一個 batch 上的表現(xiàn)永遠不怎么樣。

整個系統(tǒng)

以下便是整個可運行的系統(tǒng)，你只需要復制粘貼然后運行。

from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 100
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4
num_classes = 2
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length

def generateData():
x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
y = np.roll(x, echo_step)
y[0:echo_step] = 0

x = x.reshape((batch_size, -1)) # The first index changing slowest, subseries as rows
y = y.reshape((batch_size, -1))

return (x, y)

batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

init_state = tf.placeholder(tf.float32, [batch_size, state_size])

W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)

W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

# Unpack columns
inputs_series = tf.unpack(batchX_placeholder, axis=1)
labels_series = tf.unpack(batchY_placeholder, axis=1)

# Forward pass
current_state = init_state
states_series = []
for current_input in inputs_series:
current_input = tf.reshape(current_input, [batch_size, 1])
input_and_state_concatenated = tf.concat(1, [current_input, current_state]) # Increasing number of columns

next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b) # Broadcasted addition
states_series.append(next_state)
current_state = next_state

logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

def plot(loss_list, predictions_series, batchX, batchY):
plt.subplot(2, 3, 1)
plt.cla()
plt.plot(loss_list)

for batch_series_idx in range(5):
one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

plt.subplot(2, 3, batch_series_idx + 2)
plt.cla()
plt.axis([0, truncated_backprop_length, 0, 2])
left_offset = range(truncated_backprop_length)
plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

plt.draw()
plt.pause(0.0001)

with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
plt.ion()
plt.figure()
plt.show()
loss_list = []

for epoch_idx in range(num_epochs):
x,y = generateData()
_current_state = np.zeros((batch_size, state_size))

print("New data, epoch", epoch_idx)

for batch_idx in range(num_batches):
start_idx = batch_idx * truncated_backprop_length
end_idx = start_idx + truncated_backprop_length

batchX = x[:,start_idx:end_idx]
batchY = y[:,start_idx:end_idx]

_total_loss, _train_step, _current_state, _predictions_series = sess.run(
[total_loss, train_step, current_state, predictions_series],
feed_dict={
batchX_placeholder:batchX,
batchY_placeholder:batchY,
init_state:_current_state
})

loss_list.append(_total_loss)

if batch_idx%100 == 0:
print("Step",batch_idx, "Loss", _total_loss)
plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()

via medium，原作者 Erik Hallstr?m，雷鋒網(wǎng)編譯

“TensorFlow & 神經網(wǎng)絡算法高級應用班”要開課啦！

教你從零開始在 TensorFlow 上搭建 RNN（完整代碼）！

從初級到高級，理論+實戰(zhàn)，一站式深度了解 TensorFlow！

本課程面向深度學習開發(fā)者，講授如何利用 TensorFlow 解決圖像識別、文本分析等具體問題。課程跨度為 10 周，將從 TensorFlow 的原理與基礎實戰(zhàn)技巧開始，一步步教授學員如何在 TensorFlow 上搭建 CNN、自編碼、RNN、GAN 等模型，并最終掌握一整套基于 TensorFlow 做深度學習開發(fā)的專業(yè)技能。

兩名授課老師佟達、白發(fā)川身為 ThoughtWorks 的資深技術專家，具有豐富的大數(shù)據(jù)平臺搭建、深度學習系統(tǒng)開發(fā)項目經驗。

時間：每周二、四晚 20：00-21：00
開課時長：總學時 20 小時，分 10 周完成，每周2次，每次 1 小時
線上授課地址：http://www.mooc.ai/

一文讀懂 CNN、DNN、RNN 內部網(wǎng)絡結構區(qū)別

萬事開頭難！入門TensorFlow，這9個問題TF Boys 必須要搞清楚

TensorFlow在工程項目中的應用公開課視頻+文字轉錄（上） | AI 研習社

一文詳解如何用 TensorFlow 實現(xiàn)基于 LSTM 的文本分類（附源碼）

雷峰網(wǎng)版權文章，未經授權禁止轉載。詳情見轉載須知。

18人收藏

三川

用愛救世界

發(fā)私信

當月熱門文章