Posts

Best Practices

Feature Engineering   Normalize parameter to get the percentage Following example shows how to use normalize input parameter to get the grouping in percentage print (y_test.value_counts(normalize= True )* 100 ) Exited 0 79.25 1 20.75   Split to Train, Test and Validation Following example shows how to divide data into temporary and test sets with a ratio of 80:20 divide the temporary set into train and validation with a ratio of 75:25   # first we split data into 2 parts, say temporary and test X_temp, X_test, y_temp, y_test = train_test_split(  X, y, test_size=0.2, random_state=1, stratify=y ) # then we split the temporary set into train and validation X_train, X_val, y_train, y_val = train_test_split(  X_temp, y_temp, test_size=0.25, random_state=1, stratify=y_temp ) print(X_train.shape, X_val.shape, X_test.shape)   Data Encoding from sklearn.preprocessing import LabelEncoder