How to detect Covariate shift of NLP models?

Question

igal leikin

2022年5月5日 12:43

I have an NLP model, for example, Sentiment Analysis.

This model serves in production.

I want to detect Data Drift, and specifically Covariate Shift for this model.

I saw that Cosine Similarity may solve this issue, but I'm concerned about:

The ability to calculate it - Cosine similarity can be calculated for vectors that are in the same vector space. Are all of the embeddings that a model produces live in the same space?
The time complexity - If I have 1M training data points and 1M prediction data points and I want to infer the average Cosine difference, I'll have to find the cosine difference of every prediction compared to every training data point. I can sample, but what is the sampling algorithm?

I'd love to hear your opinion about:

Topic nlp