Preventing fitting Regression CNN to the mean when dataset has only few outliers
I am trying to train a CNN for regression on a dataset where most of the points lie around a similar output value. There are however a few outliers that are very important but they are less represented and the trained network thus tends to predict all output values close to the mean of the whole dataset (underfitting). This leads to a somwhat small error (and good precision) because the vast majority of points lie in that range, but the error is way higher for points even slightly outside the "normal" case.
But since this regressor would be most useful to predict the output for outliers (quality-control use case), it's currently pretty much useless.
Is there a way to prevent this kind of behavior and have a CNN that is giving a greater weight to outliers and extrema, in order to avoid underfitting?
To some extent, the Random Forest method, although much better at predicting the output for outliers, still exhibits a higher error for points in the extrema, while the error around the mean is very small. The "low" points are predicted with too high a value, and the "high" points are predicted with too low a value (each time closer to the mean). So any idea for that case would be great too!
Thanks a lot
Topic cnn regression random-forest
Category Data Science