Start & End Tokens in LSTM when making predictions

I see examples of LSTM sequence to sequence generation models which use start and end tokens for each sequence.

I would like to understand when making predictions with this model, if I'd like to make predictions on an arbitrary sequence - is it required to include start and end tokens tokens in it?

Topic lstm tensorflow rnn nlp

Category Data Science


It depends on what you use the LSTM for.

For sequence labeling or sequence classification, the special tokens are not necessary. Although, there might be a slight benefit of informing the network of what is the beginning and the end of a sentence, especially if the initial LSMT state is fixed and learned.

For autoregressive sequence-to-sequence models, the special tokens are crucial. The beginning-of-sentence token serves as an instruction to the decoder to start decoding (it needs a very first state to predict what the next first token is). The end-of-sentence token is an instruction for the decoding algorithm to stop generating more tokens.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.