Best approach for A/B testing two different recommendation systems

Question

Best approach for A/B testing two different recommendation systems

Michael Pulis

2022年4月25日 03:06

I have two recommendation systems for musical preference that make a list of predictions for a particular user based on the songs they have saved in their library. The user then rates how good each recommendation was out of 6. I will be evaluating the performance of the recommendation systems based on the average rating given to songs recommended by system A and system B.

Let us use A to denote a song recommended by system A and B to denote a song recommended by system B. For a particular user, should the recommendations be (AAAAAA or BBBBBB) or should they be (ABABABAB)? I implemented the first for now being (AAAAAA or BBBBBB). Thus, in the current system, each respondent will be randomly assigned A or B and only get the recommendations from that system. Is this the right approach or does only recommending 1 system to each respondent bias them against what the other system could have recommended?

Let us assume that B is far better than A. If we only recommend the same system to each user, and a user listens to songs which are all system A, they would never had heard system B, and the ratings of A would probably have been different (lower) if they had listened to the better system too. Is the ABABAB approach the best one? Which is the best method to evaluate the performance of each system while reducing bias?

Thanks.

Topic recommender-system

Category Data Science

Jayaram Iyer · Accepted Answer · 2021年5月3日 17:01

This seems like a good use case for a Bayesian Bandit using Thompson Sampling.

This will allow you to start with a 50-50 recommendation - something like:

ABABABAB

but eventually end up with AAAAABAAAAAB (mostly As) or BBBBBABBBBB (mostly Bs) based on a user liking one of them more than the other.

You might also end up with ABABAABABABAB (blended) if the users like both.

The approach assumes that you have access to feedback or ratings from your users in real time and have the ability to act on it

Read it up - it is pretty simple to implement.

nsm10 · Accepted Answer · 2021年5月3日 16:05

1

nsm10 answered at 2021年5月3日 16:05

If user rates how good each recommendation then ABABAB approach is the best one. There could be case when system A becomes better than B overtime, then having 2 system does not make sense.

Best approach for A/B testing two different recommendation systems

About