gluonnlp.loss

Gluon NLP Toolkit provides tools for easily setting up task specific loss.

Activation Regularizers

We now provide activation regularization and temporal activation regularization defined in the following work.

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}}
ActivationRegularizationLoss Computes Activation Regularization Loss.
TemporalActivationRegularizationLoss Computes Temporal Activation Regularization Loss.

API Reference

NLP loss.

class gluonnlp.loss.ActivationRegularizationLoss(alpha=0, weight=None, batch_axis=None, **kwargs)[source]

Computes Activation Regularization Loss. (alias: AR)

The formulation is as below:

\[L = \alpha L_2(h_t)\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t. \(\alpha\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}
Parameters:
  • alpha (float, default 0) – The scaling coefficient of the regularization.
  • weight (float or None) – Global scalar weight for loss.
  • batch_axis (int, default 0) – The axis that represents mini-batch.
hybrid_forward(F, *states)[source]
Parameters:states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).
Returns:loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
Return type:NDArray
class gluonnlp.loss.TemporalActivationRegularizationLoss(beta=0, weight=None, batch_axis=None, **kwargs)[source]

Computes Temporal Activation Regularization Loss. (alias: TAR)

The formulation is as below:

\[L = \beta L_2(h_t-h_{t+1})\]

where \(L_2(\cdot) = {||\cdot||}_2, h_t\) is the output of the RNN at timestep t, \(h_{t+1}\) is the output of the RNN at timestep t+1, \(\beta\) is scaling coefficient.

The implementation follows the work:

@article{merity2017revisiting,
  title={Revisiting Activation Regularization for Language RNNs},
  author={Merity, Stephen and McCann, Bryan and Socher, Richard},
  journal={arXiv preprint arXiv:1708.01009},
  year={2017}
}
Parameters:
  • beta (float, default 0) – The scaling coefficient of the regularization.
  • weight (float or None) – Global scalar weight for loss.
  • batch_axis (int, default 0) – The axis that represents mini-batch.
hybrid_forward(F, *states)[source]
Parameters:states (list) – the stack outputs from RNN, which consists of output from each time step (TNC).
Returns:loss – loss tensor with shape (batch_size,). Dimensions other than batch_axis are averaged out.
Return type:NDArray