Posts

Showing posts from September, 2021

Best Practices

Feature Engineering   Normalize parameter to get the percentage Following example shows how to use normalize input parameter to get the grouping in percentage print (y_test.value_counts(normalize= True )* 100 ) Exited 0 79.25 1 20.75   Split to Train, Test and Validation Following example shows how to divide data into temporary and test sets with a ratio of 80:20 divide the temporary set into train and validation with a ratio of 75:25   # first we split data into 2 parts, say temporary and test X_temp, X_test, y_temp, y_test = train_test_split(  X, y, test_size=0.2, random_state=1, stratify=y ) # then we split the temporary set into train and validation X_train, X_val, y_train, y_val = train_test_split(  X_temp, y_temp, test_size=0.25, random_state=1, stratify=y_temp ) print(X_train.shape, X_val.shape, X_test.shape)   Data Encoding from sklearn.preprocessing import LabelEncoder