Identifying subsets of values significant to the total sum
Imagine a set of products in a store, with all the different attributes assigned to them - some of these hierarchical (e.g. categories), and some not (e.g. brand), but none of them continuous (if that is even important here). For each product, we know how much (in money-value) we've sold last year, and how much we sold this year. The sum of all of the differences in these sales is equal to the difference in total sales between the two years.
What we're interested in is finding some nice rules which describe the biggest sales-change-driving sets of products. For example: Smartphones have dropped in sales by 10% (\$123456), Apple products increased in sales by 20% (\$31234), etc. In other words, an end user is interested in learning what's driving our sales up and down in some easily consumable format. Hierarchical attributes should be taken into account here, as well as the combination of orthogonal attributes.
My question is not about forming those sentences, but in general about how to find those structured rules. Additionally, how to best identify composite rules, like Smartphones are dropping in sales, apart from Apple smartphones which are growing.
A very important additional question is how to balance relative and absolute change of these rules. Are there some best-practice approaches on how to deal with this tradeoff?
A somewhat naive approach would be to build a decision tree for sales-change prediction (either relative or absolute) for each product, and then use that tree as a foundation for rules, or maybe just plot the tree itself and present to the end user.
All ideas, even ones very remote to the main question, are very welcome!
Topic decision-trees
Category Data Science