Model Guardrails
Suppose I am building a machine learning model for an application where I do not need to make a prediction on all new samples, and given a new sample, it is better to make no prediction at all when there is concern that the prediction is unlikely to be good (for example, if the new sample appears to be very different than the training samples). I'm calling the idea of restricting which new samples to make a prediction on "model guardrails" because I don't know of an official term.
My question is, are there any standard methods of putting such guardrails in place? Is there any research on this topic that you can direct me to? A few basic ideas I have are:
Use a distance metric to compare the new sample to the training data and only make a prediction if there is a sufficient amount of training data sufficiently close to the new data.
Try to calculate some sort of p-value that indicates how dissimilar the new sample is to training data, and only make a prediction when this p-value is not too high.
To extend on idea 2), the exact method might have to depend on the training distribution, but in a simple case, perhaps one could calculate a p-value that represents the probability that sampling from the training data would yield a sample at least as far from average as the new sample (e.g. if we are regressing $y$ on $x$ and the training data appears to be a standard normal distribution, only make predictions when $x$ is within $[-2, 2]$).
I would appreciate references to the literature, a description of any standard techniques, or even just the right terminology to use.
Topic predictive-modeling
Category Data Science