How to determine irrelevant data in binary classification?
Suppose I was looking at social media to find users' intents on trading stocks. I might have a binary classification model that predicted "buy" and "sell". However, it's obvious that most social media posts mentioning a company are not related to buying or selling its stock. Even if I was to look specifically at places in the Internet where the main topic of discussion is buying and selling stocks, there would be handfuls of posts that were in a sense "off-topic" (e.g. "I applied to Microsoft today." or "What does everyone here think about Alphabet?")
My question is, how would one go about recognizing when a social media post does not suggest a user would buy or sell the stock. I had three quick ideas:
Create rules that would be able to differentiate relevant from irrelevant posts
Create a second binary classifier, that differentiates between relevant and irrelevant posts, and then uses the main classifier on only the relevant posts
Change the binary classifier into a classifier that can detect buy, sell, and off-topic documents.
Is there a customary approach to this problem?
Topic classification
Category Data Science