https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912
The Recurrent Neural Network (RNN) allows
information to persist from one cycle to the next. Like a
state machine, part of the input is from
the output. Since they allow us to operate over sequences of vectors, we
can effectively implement variable models with not only one to one, but also
one to many, many to one, or many to many mappings in various shapes. There
are no constrains on the lengths of sequences, since the transformation is
fixed and can be applied as many times as we like. RNN's are implemented
as 3 matrixes: xh, applied to the current
input, hh, applied to the previous hidden state, and hy, applied to the output.
The RNN is first initialized with random values in those matrixes and then
they are updated over many training samples each with a sequence of inputs
and desired outputs, to produce the desired output at each step in the
sequence. One the RNN is trained, the matrixes are initialized with the learned
values at the start of each sequence, and the hidden matrix (hh) is modified
during each step. Example code:
class RNN:
# ...
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
y = np.dot(self.W_hy, self.h)
return y
The np.tanh (hyperbolic tangent) function implements a non-linearity
that squashes the activations to the range [-1, 1]. The input, x, is combined
with the xh matrix via the numpy dot product vector operation. It is added
to the dot product of the internal state and the hh matrix, then squashed
to produce a new internal state. Finally, the output is processed through
the hy matrix and returned.