forcing decision tree use specific features first

My goal it to force some feature used firstly to split tree. Below, the function splitted tree using feature_3 first. For instance, is there a way to force to use feature_2 first instead of feature_3 ?

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

iris = datasets.load_iris()
X = iris.data
y = iris.target
fit = DecisionTreeClassifier(max_leaf_nodes=3, random_state=0).fit(X,y)   
      

text_representation = tree.export_text(fit) 
print('Graph')
print(text_representation)

Topic decision-trees

Category Data Science


If you want to force your own split (your own segmentation of the data), split the data yourself and build separate trees. This will allow each tree to split, optimize, build to the proper depth, regularization, etc. for each segment.

Then your scoring routine looks at your segmentation then uses the appropriate tree.

I use this technique when I believe (research by SMEs and data) that the segments are different enough - often even have different data available to each - that makes this extra effort worthwhile. I do not segment and build the models, then compare to segmented models to check which gives me the performance I need.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.