I faced the same issue when I first dabbled into these research papers.I struggled mightily to parse through them and grasp what the underlying technique was.The ON-LSTM model gives an impressive performance on sequences longer than 3.

This means that larger units or constituents comprise of smaller units or constituents (phrases). While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of units.

This paper proposes to add such an inductive bias by ordering the neurons.

The researchers aim to integrate a tree structure into a neural network language model.

The reason behind doing this is to improve generalization via better inductive bias and at the same time, potentially reduce the need for a large amount of training data.

Finally, the mathematical models involved are presented and demonstrated.

A tree-structured model can achieve quite a strong performance on this dataset. This process potentially reduces the parameter-counts by more than 90% without affecting the accuracy. It also decreases the size and energy consumption of a trained network, making our inference more efficient. The various types of neural networks are explained and demonstrated, benefits of artificial neural network are described, and a detailed historical background is provided. The connection between the artificial and the real neurons is also investigated and explained. The structure of a natural language is hierarchical. So, the researchers have proposed to make the gate for each neuron dependent on the others by enforcing the order in which neurons should be updated. ON-LSTM includes a new gating mechanism and a new activation function cumax(). The cumax() function and LSTM are combined together to create a new model ON-LSTM.


