How to detect Covariate shift of NLP models?

I have an NLP model, for example, Sentiment Analysis.

This model serves in production.

I want to detect Data Drift, and specifically Covariate Shift for this model.

I saw that Cosine Similarity may solve this issue, but I'm concerned about:

  • The ability to calculate it - Cosine similarity can be calculated for vectors that are in the same vector space. Are all of the embeddings that a model produces live in the same space?

  • The time complexity - If I have 1M training data points and 1M prediction data points and I want to infer the average Cosine difference, I'll have to find the cosine difference of every prediction compared to every training data point. I can sample, but what is the sampling algorithm?

I'd love to hear your opinion about:

  • Different solutions to calculate Covariate shift for NLP models

  • My concerns about the cosine similarity

Topic nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.