Decision Tree: Efficient splitting of nodes, minimize number of gini evaluations
I have a dataset specific problem where i need to use a splitting function other than gini_index. This requires me to re-write a decision tree from scratch. I have a working model, but itis highly inefficient.
To make a split i currently iterate though each feature and then through each unique datapoint in that dataset for each node (total of nodes x features x unique levels gini evaluations). Cause of this my DT on a 300k X 145 dataset has been running for 2 days.
How can I cut down on the number of splitting evaluations, or speed up the program. I read Fisher Yates algorithm in Sklean's code, but I don't understand the logic. Any help would be appreciated.
Topic decision-trees scikit-learn
Category Data Science