How to build regression model on residuals
Let's say you have a good-performing regression model, but the end result is not just yet there.
One of the ideas, I came across to improve model performance is to build a second model using the first model's residuals (y-y_pred) on top. This model would suppose to learn for which observations the first model is overpredicting or underpredicting. The results from the second model would then be used together with the first's to improve the prediction.
Has anyone tried to implement this to a regression task? Are there any resources that I miss that have successfully implemented this?
Here are some of the questions I have:
- Should the second model be trained only on the residuals? or on residuals+features?
- What should be the overall best practices to implement this in a
sklearn pipeline
?
Topic regression scikit-learn machine-learning
Category Data Science