Feature importance for particular classes

Suppose I have a dataset labeled with two classes such as healthy and unhealthy and I applied feature selection (feature importance) on the dataset.

How can I know if the features are important to a particular class (to healthy or unhealthy)?

Topic dataset

Category Data Science


Assuming we are talking about feature importance for decision tree algorithms here - you cannot really say. It only tells you how often a feature is used to split both classes apart.

If you would like more insight into how your model makes decisions you could look into SHAP and LIME. Both are methods that approximate your model and then try to explain it. You can check out these two libraries in Python.


Something like this should get you going.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


df = pd.read_csv("https://rodeo-tutorials.s3.amazonaws.com/data/credit-data-trainingset.csv")
df.head()

from sklearn.ensemble import RandomForestClassifier

features = np.array(['revolving_utilization_of_unsecured_lines',
                     'age', 'number_of_time30-59_days_past_due_not_worse',
                     'debt_ratio', 'monthly_income','number_of_open_credit_lines_and_loans', 
                     'number_of_times90_days_late', 'number_real_estate_loans_or_lines',
                     'number_of_time60-89_days_past_due_not_worse', 'number_of_dependents'])
clf = RandomForestClassifier()
clf.fit(df[features], df['serious_dlqin2yrs'])

# from the calculated importances, order them from most to least important
# and make a barplot so we can visualize what is/isn't important
importances = clf.feature_importances_
sorted_idx = np.argsort(importances)


padding = np.arange(len(features)) + 0.5
plt.barh(padding, importances[sorted_idx], align='center')
plt.yticks(padding, features[sorted_idx])
plt.xlabel("Relative Importance")
plt.title("Variable Importance")
plt.show()

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.