Long-short term memory in time-series data

Time-series data such as those in the stock market is usually dependent on the previous n historical data points.

Recurrent Neural Network (RNN) is applied to sequence data to capture the intra-sequence dependence. Long-short term memoy network (LSTM) is a variant of RNN, capturing long-term impact on short-term signal behaviour.

Key difference between a simple RNN and LSTM:

Simple RNN: later output nodes of the network are less sensitive to the input at time t=1: gradient vanishes.
LSTM: Preseves gradient by implementing “forget” and “input” gates.

LSTM holds the following components in each layer:

Inputs: Previous ouptut ($ h_{t-1} $) and current input ($ x_{t} $)
Forget gate:
- System: $ \sigma $ decides whehter to throw away information from the current cell state $ C_t $
- Ouput: $f_t =\sigma( W_f \times [h_{t-1}, x_t] + b_f) $ A number between 0 and 1 for each number in cell state $ C_{t-1} $.
Input gate:
- System 1: $ \sigma $ decides which values will be updated
  - Output: $ i_t = \sigma(W_i \times [h_{t-1}, x_t]+b_i) $
- System 2: $ tanh $ creates a vector of new candidate values
  - Output: $ \tilde{C_t} = tanh(W_C \times [h_{t-1}, x_t] + b_C) $
- Output: $ i_t \times \tilde{C_t} $
Summation of Forget and Input gates:
- $ C_t = f_t \times C_{t-1} + i_t \times \tilde{C_t} $
Final Process:
- System 1: $ \sigma $ decides what parts of the cell state $ C_t $ will be outputed
  - Output: $ o_t = \sigma(W_o \times [h_{t-1}, x_t] + b_o) $
- System 2: $ tanh $ to generate a value between -1 and 1, multiply by $ o_t $
- Output: $ h_t = o_t \times tanh(C_t) $