How to impute using simple imputer (custom function)
I am imputing my data using simple imputer from sklearn. i want to test many different ways of applying transformations to the data. i.e for logisitcic regression i would like to
- remove nans and replace with mode
 - replace +infs with max and -infs with min
 - use standard scaler.
 
then for using xgboost i would like to:
- simply replace -infs/+infs with very large or -ve large numbers.
 
i have been playing with sklearn pipeline and i would like to know how i can pass the custom imputers through the pipeline? e.g:
logistic_pipeline = Pipeline( steps = [('imputer', SimpleImputer(strategy = 'most frequent') ),
                                  ( 'std_scaler', StandardScaler() ),
                        ( 'model', LinearRegression() )] )
but how do i incorprate the following function into it where i am replacing infs from the training datase (df) with the max of that column . then using this max to populate it into the test.. how can i do this using pipeline?
def replace_pos_inf(df, dftest, numeric_features):
    for col in df[numeric_features].columns:
        m = df.loc[df[col] != np.inf, col].max()
        df[col].replace(np.inf,m,inplace=True)
        dftest[col].replace(np.inf,m,inplace=True)
    for col in df[numeric_features].columns:
        mini = df.loc[df[col] != -np.inf, col].min()
        df[col].replace(-np.inf,mini,inplace=True)
        dftest[col].replace(-np.inf,mini,inplace=True)
    return df,dftest
Topic training data-imputation scikit-learn python machine-learning
Category Data Science