Understanding dimensions of Keras LSTM target

I'm learning about Keras and LSTMs and came across this tutorial, but I don't understand the dimensions of the target variable. Quoting the article below:

The training y data in this case is the input x words advanced one time step – in other words, at each time step the model is trying to predict the very next word in the sequence. However, it does this at every time step – hence the output layer has the same number of time steps as the input layer.

To make this a bit clearer, consider the following sentence:

“The cat sat on the mat, and ate his hat. Then he jumped up and spat”

If num_steps is set to 5, the data consumed as the input data for a given sample would be “The cat sat on the”. In this case, because we are predicted the very next word in the sequence via our model, for each time step, the matching output y or target data would be “cat sat on the mat”.

The article than shows the following code to generate batches:

def generate(self):
    x = np.zeros((self.batch_size, self.num_steps))
    y = np.zeros((self.batch_size, self.num_steps, self.vocabulary))
    while True:
        for i in range(self.batch_size):
            if self.current_idx + self.num_steps = len(self.data):
                # reset the index back to the start of the data set
                self.current_idx = 0
            x[i, :] = self.data[self.current_idx:self.current_idx + self.num_steps]
            temp_y = self.data[self.current_idx + 1:self.current_idx + self.num_steps + 1]
            # convert all of temp_y into a one hot representation
            y[i, :, :] = to_categorical(temp_y, num_classes=self.vocabulary)
            self.current_idx += self.skip_step
        yield x, y

I understand that the model is trying to predict the next word, and I understand that the dimensions of x are self.batch_size, self.num_steps, but I don't understand the y dimensions: self.batch_size, self.num_steps, self.vocabulary. Why not just self.batch_size, self.vocabulary? I might have misunderstood the article, but I thought each record in the batch corresponds to a different time step, or at least that's what I would have assumed for any other non-LSTM type of model. Does Keras automatically unroll the y output because it involves an LSTM?

Topic lstm keras machine-learning

Category Data Science


If num_steps is set to 5, the data consumed as the input data for a given sample would be “The cat sat on the”. In this case, because we are predicted the very next word in the sequence via our model, for each time step, the matching output y or target data would be “cat sat on the mat”.

In this example, y is a sequence of words with length num_steps starting at the very next word. The input and output both have length num_steps.

Keras will return a sequence for the output when return_sequences is True. https://keras.io/layers/recurrent/#lstm

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.

In general, the input to an LSTM has dimensions (batch, time, features). The batch dimension should not be dependent on time. The output can either be a sequence or a single number, which in Keras is handled by return_sequences.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.