Natural Language processing.

Natural Language AI Processing.

Dictionary

Microsoft Chatbot AI

See also:

https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912 The Recurrent Neural Network (RNN) allows information to persist from one cycle to the next. Like a state machine, part of the input is from the output. Since they allow us to operate over sequences of vectors, we can effectively implement variable models with not only one to one, but also one to many, many to one, or many to many mappings in various shapes. There are no constrains on the lengths of sequences, since the transformation is fixed and can be applied as many times as we like. RNN's are implemented as 3 matrixes: xh, applied to the current input, hh, applied to the previous hidden state, and hy, applied to the output. The RNN is first initialized with random values in those matrixes and then they are updated over many training samples each with a sequence of inputs and desired outputs, to produce the desired output at each step in the sequence. One the RNN is trained, the matrixes are initialized with the learned values at the start of each sequence, and the hidden matrix (hh) is modified during each step. Example code:
```
class RNN:
  # ...
  def step(self, x):
     # update the hidden state
     self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
     # compute the output vector
     y = np.dot(self.W_hy, self.h)
     
     return y
```
The np.tanh (hyperbolic tangent) function implements a non-linearity that squashes the activations to the range [-1, 1]. The input, x, is combined with the xh matrix via the numpy dot product vector operation. It is added to the dot product of the internal state and the hh matrix, then squashed to produce a new internal state. Finally, the output is processed through the hy matrix and returned.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ The major limitation of standard RNN's is that they are not really able to access data from several states before the current state. LSTMs address that issue. In each step, LSTMs process the data from the prior step in several ways each using "gates" which are composed of a signmoid NN and a dot product.

The first gate removes data from the prior state, C_t-₁, using the prior state and the new input, x_t. f_t = ó(W_f •[h_t-1, x_t] + b_f) This resets data that is no longer applicable based on input. E.g. when transitioning from one subject to another in a sentence. "John is tall, but Paul is short" needs to forget "tall" when transitioning from John to Paul.

The next gate adds in new information from the input vector. This is done in two parts: A sigmoid NN decides which information to update, i_t = ó(W_i •[h_t-1, x_t] + b_i) and a tanh layer builds new data from input ~C_t = tanh(W_c •[h_t-1, x_t] + b_C) . These are combined with the prior gate to update the internal state. C_t= f_t • C_t-₁ + i_t• ~C_t

Finally, the output is built from a filtered copy of the cell sate. First a sigmoid layer decides which part to output o_t = ó(W_o •[h_t_-1, x_t] + b_o) Then a tanh is used to ensure the result is between -1 and 1 and that is multiplied by the output gate. h_t = o_t • tanh(C_t).

Obviously, with the level of complexity here, training this is time consuming. h and C are the same size. W_f, W_i, W_o are all the dimension of [h_t-1, x_t] X dimension of C_t. h_t-1 is stacked on the top of x_t to form a single vertical one D vector. Let the total height be N. So W_f is a matrix of height N and width M.So W_f • [h_t-1/x_t]===> [M x N][N x 1] gives [M x1] this the dimension of C_t.
https://notebooks.azure.com/hoyean-song/projects/tensorflow-tutorial/html/LSTM-breakdown-eng.ipynb A notebook on Azure that demonstrates the above LSTM.
http://proceedings.mlr.press/v37/jozefowicz15.pdf More on training RNNs and LSTMs
https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72 Start of a series on NLP, including the basics up through part of speech, and other analysis.
https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077 Contextual Chat Bots with Tensorflow
http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/ Deep learning for chatbots
http://willschenk.com/bot-design-patterns/?imm_mid=0e50a2&cmp=em-data-na-na-newsltr_20160622 Bot Design Patternsl
https://github.com/github/hubot Build your own bots based on Github's Hubot