Does it makes sense to train the model on whole data?

Suppose I am training an lstm model on a stock price data.

So for first iteration say I have trained it on 80% of data and then tested it on rest of the 20% data and got the rmse value.

Now after this does it makes sense to again train it the whole data before predicting the value?

Example i have data for aapl from 2010 to today and I have trained it on 2010 to 2020 and tested on from 2020 till today and got the rmse values.

Now before predicting the next day value does it makes sense to again train it on whole data set i.e. from 2010 till today?

Because what I have observed is that in testing is that initial predictions have less error than the farther ones so i thought maybe I should train it on whole data set before predicting the next day or week value given that I know the accuracy of model from testing earlier samples.

Does it sound good or it has drawbacks that I am not aware?

Topic lstm training regression time-series

Category Data Science


Yes, first you select the best model & measure its performance using training & test data, then fit this best model on the full data. We try to use as much data possible for better results. Refer this stackexchange answer where he explained this: https://stats.stackexchange.com/a/366288

and also this article explanining why use of train/validation/test splits and using full data to train final model: https://machinelearningmastery.com/train-final-machine-learning-model/

I hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.