Comparing performances of different models using hypothesis testing

A common workflow in applied ML is you train different models on some data and evaluate it on a particular test set. I have often seen people just selecting a ML metric based on their requirements and choosing ML models on the basis of that.

But is the above process right ? Shouldn't we be ideally doing hypothesis testing and arriving at statistical and practical significance before saying model A model B simply based on ML metric calculated on a common test set

Topic statistics machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.