Classifying and predicting whether a bank customer will leave or not by using Random Forest as a baseline model and compared with several Neural Network models with different parameters.
The dataframe has 10000 instances and 14 attributes. Each instance contains information about a customer's personal data at a bank.
Data summary:
1. Reading and understanding data.
2. Removal of unwanted attributes: row number, costumer ID, surname.
3. There are no missing values.
4. There are some outliers that will be scaled using Standard Scaler.
5. Converting categorical variables such as geography and gender into numeric variables using One Hot Encoding.
Model 1 (random forest) performed best overall. It is fast and have the highest accuracy and AUC score.
Model 1 (random forest) performed best overall. It is fast and have the highest accuracy and ROC score.
For the next step it is recommended to build a neural network model with other parameters to increase the accuracy, or use another deep learning model.