Classifier with a single 1 value per year

Let's say I want to find the probabilities of winning the best movie category in the Oscars. I know to following rules:

  1. There is only 1 winner per year.
  2. Logically, the sum of the predicted probabilities for each year should add up to be 1.

I can have a year feature in my classifier but that does not mean that 1 and 2 are enforced. If I use a simple classifier like logistic regression I also don't see how having the year as a feature is going to help.

My questions are:

Is there a way to enforce those rules in a model? Does it matter? Should I just use a model without those rules and normalize the probabilities to 1 when I try to predict for a given year?

Topic classification

Category Data Science


As @Emre noted in the comments, you will want to use the softmax function. After acquiring a set of scores per film, the function will squash the scores into the range [0,1] and the scores will add up to 1.


Here's a possible procedure:

  1. Generate a feature set per image
  2. Train a model on your feature set to output a score for a movie. The score is a number in an arbitrary domain.
  3. Repeat step 2 for all of your candidate movies to create a score vector.
  4. Feed the score vector into the softmax function to squash the scores into the domain [0,1] and make them add to 1

For reference, the softmax function is defined:

$ \sigma(z_i) = \frac{e^{z_i}}{\displaystyle\sum^N_{n=1}e^{z_n}} $


And an example:

Our feature set will include the following: critic/audience ratings, revenue, cost, total tickets sold, etc.

Let's say you have historical data of the Oscars that includes the feature sets and a score that we will use for the ground truth metric. You define the score. For instance, if a movie was not even nominated for the Oscar award, it might have a score of zero, whereas an Oscar winning movie might have a score of 1. A movie that was nominated but did not receive many votes might have a score of 0.50.

You train a model on your historical data so that, given a movie's feature set, it will output a score, similar to your training set.

Now you are considering three movies for prediction: Avengers Infinity War, Deadpool 2, and Venom.

You acquire their feature sets (the same categories that you used for training: critic/audience ratings, revenue, etc.)

Then you pass each feature set through your model and acquire a vector of scores:

\begin{array}{|l|c|} \hline Movie & Score\\ \hline Avengers\ Infinity\ War & 0.98\\ Deadpool\ 2 & 0.82\\ Venom & 0.24\\ \hline \end{array}

We can interpret the score results as a probability by using the softmax function:

The denominator of the function is given:

$ \displaystyle\sum^N_{n=1} e^{z_n} = e^{0.98} + e^{0.82} + e^{0.24} \ \ \ where: z=\{0.98,0.82,0.24\} $

We calculate the softmax of a given score $ x $ as such:

$ \Large\frac{e^x}{e^{0.98} + e^{0.82} + e^{0.24}} $

Therefore the softmax scores are:

\begin{array}{|l|c|c|} \hline Movie & Score & Softmax\\ \hline Avengers\ Infinity\ War & 0.98 & 0.42932\\ Deadpool\ 2 & 0.82 & 0.36584\\ Venom & 0.24 & 0.20484\\ \hline \end{array}

and we can see that:

$ 0.42932 + 0.36584 + 0.20484 = 1 $

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.