How exactly does LSTMCell from TensorFlow operates?

  • A+
Category:Languages

I try to reproduce results generated by the LSTMCell from TensorFlow to be sure that I know what it does.

Here is my TensorFlow code:

num_units = 3 lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)  timesteps = 7 num_input = 4 X = tf.placeholder("float", [None, timesteps, num_input]) x = tf.unstack(X, timesteps, 1) outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)  sess = tf.Session() init = tf.global_variables_initializer() sess.run(init)  x_val = np.random.normal(size = (1, 7, num_input))  res = sess.run(outputs, feed_dict = {X:x_val})  for e in res:     print e 

Here is its output:

[[-0.13285545 -0.13569424 -0.23993783]] [[-0.04818152  0.05927373  0.2558436 ]] [[-0.13818116 -0.13837864 -0.15348436]] [[-0.232219    0.08512601  0.05254192]] [[-0.20371495 -0.14795329 -0.2261929 ]] [[-0.10371902 -0.0263292  -0.0914975 ]] [[0.00286371 0.16377522 0.059478  ]] 

And here is my own implementation:

n_steps, _ = X.shape h = np.zeros(shape = self.hid_dim) c = np.zeros(shape = self.hid_dim)  for i in range(n_steps):     x = X[i,:]      vec = np.concatenate([x, h])     #vec = np.concatenate([h, x])     gs = np.dot(vec, self.kernel) + self.bias       g1 = gs[0*self.hid_dim : 1*self.hid_dim]     g2 = gs[1*self.hid_dim : 2*self.hid_dim]     g3 = gs[2*self.hid_dim : 3*self.hid_dim]     g4 = gs[3*self.hid_dim : 4*self.hid_dim]      I = vsigmoid(g1)     N = np.tanh(g2)     F = vsigmoid(g3)     O = vsigmoid(g4)      c = c*F + I*N      h = O * np.tanh(c)      print h 

And here is its output:

[-0.13285543 -0.13569425 -0.23993781] [-0.01461723  0.08060743  0.30876374] [-0.13142865 -0.14921292 -0.16898363] [-0.09892188  0.11739943  0.08772941] [-0.15569218 -0.15165766 -0.21918869] [-0.0480604  -0.00918626 -0.06084118] [0.0963612  0.1876516  0.11888081] 

As you might notice I was able to reproduce the first hidden vector, but the second one and all the following ones are different. What am I missing?

 


Tensorflow uses glorot_uniform() function to initialize the lstm kernel, which samples weights from a random uniform distribution. We need to fix a value for the kernel to get reproducible results:

import tensorflow as tf import numpy as np  np.random.seed(0) timesteps = 7 num_input = 4 x_val = np.random.normal(size = (1, timesteps, num_input))  num_units = 3  def glorot_uniform(shape):     limit = np.sqrt(6.0 / (shape[0] + shape[1]))     return np.random.uniform(low=-limit, high=limit, size=shape)  kernel_init = glorot_uniform((num_input + num_units, 4 * num_units)) 

My implementation of the LSTMCell (well, actually it's just slightly rewritten tensorflow's code):

def sigmoid(x):     return 1. / (1 + np.exp(-x))  class LSTMCell():     """Long short-term memory unit (LSTM) recurrent network cell.     """     def __init__(self, num_units, initializer=glorot_uniform,                forget_bias=1.0, activation=np.tanh):         """Initialize the parameters for an LSTM cell.         Args:           num_units: int, The number of units in the LSTM cell.           initializer: The initializer to use for the kernel matrix. Default: glorot_uniform           forget_bias: Biases of the forget gate are initialized by default to 1             in order to reduce the scale of forgetting at the beginning of             the training.            activation: Activation function of the inner states.  Default: np.tanh.         """         # Inputs must be 2-dimensional.         self._num_units = num_units         self._forget_bias = forget_bias         self._activation = activation         self._initializer = initializer      def build(self, inputs_shape):         input_depth = inputs_shape[-1]         h_depth = self._num_units         self._kernel = self._initializer(shape=(input_depth + h_depth, 4 * self._num_units))         self._bias = np.zeros(shape=(4 * self._num_units))      def call(self, inputs, state):         """Run one step of LSTM.         Args:           inputs: input numpy array, must be 2-D, `[batch, input_size]`.           state:  a tuple of numpy arrays, both `2-D`, with column sizes `c_state` and             `m_state`.         Returns:           A tuple containing:           - A `2-D, [batch, output_dim]`, numpy array representing the output of the             LSTM after reading `inputs` when previous state was `state`.             Here output_dim is equal to num_units.           - Numpy array(s) representing the new state of LSTM after reading `inputs` when             the previous state was `state`.  Same type and shape(s) as `state`.         """         num_proj = self._num_units         (c_prev, m_prev) = state          input_size = inputs.shape[-1]          # i = input_gate, j = new_input, f = forget_gate, o = output_gate         lstm_matrix = np.hstack([inputs, m_prev]).dot(self._kernel)         lstm_matrix += self._bias          i, j, f, o = np.split(lstm_matrix, indices_or_sections=4, axis=0)         # Diagonal connections         c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) *                self._activation(j))          m = sigmoid(o) * self._activation(c)          new_state = (c, m)         return m, new_state  X = x_val.reshape(x_val.shape[1:])  cell = LSTMCell(num_units, initializer=lambda shape: kernel_init) cell.build(X.shape)  state = (np.zeros(num_units), np.zeros(num_units)) for i in range(timesteps):     x = X[i,:]     output, state = cell.call(x, state)     print(output) 

Produces output:

[-0.21386017 -0.08401277 -0.25431477] [-0.22243588 -0.25817422 -0.1612211 ] [-0.2282134  -0.14207162 -0.35017249] [-0.23286737 -0.17129192 -0.2706512 ] [-0.11768674 -0.20717363 -0.13339118] [-0.0599215  -0.17756104 -0.2028935 ] [ 0.11437953 -0.19484555  0.05371994] 

While your Tensorflow code, if you replace the second line with

lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units, initializer = tf.constant_initializer(kernel_init)) 

returns:

[[-0.2138602  -0.08401276 -0.25431478]] [[-0.22243595 -0.25817424 -0.16122109]] [[-0.22821338 -0.1420716  -0.35017252]] [[-0.23286738 -0.1712919  -0.27065122]] [[-0.1176867  -0.2071736  -0.13339119]] [[-0.05992149 -0.177561   -0.2028935 ]] [[ 0.11437953 -0.19484554  0.05371996]] 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: