Deep Learning: Costumer Churn Classification

Avatar of Mohamad Misbah.
Avatar of Mohamad Misbah.

Deep Learning: Costumer Churn Classification

Bojongsoang, Bandung Regency, West Java, Indonesia

Deep Learning: Costumer Churn Classification

Objective

Classifying and predicting whether a bank customer will leave or not by using Random Forest as a baseline model and compared with several Neural Network models with different parameters.

About data

The dataframe has 10000 instances and 14 attributes. Each instance contains information about a customer's personal data at a bank.

Data summary:

Actions taken for data cleaning and feature engineering

1.   Reading and understanding data.

2.   Removal of unwanted attributes: row number, costumer ID, surname.

3.   There are no missing values.

4.   There are some outliers that will be scaled using Standard Scaler.

5.   Converting categorical variables such as geography and gender into numeric variables using One Hot Encoding.

Exploratory data analysis

Almost all features are normally distributed

Most features are not correlated with each other. German (geography) and balance, as well as exited and age, have the highest correlation.

Classification models

Model 1 (random forest)

Model 2 (neural networks - single hidden layer)

Model 3 (neural networks - two hidden layers)

Model comparison

Model 1 (random forest) performed best overall. It is fast and have the highest accuracy and AUC score.

ROC curve and confusion matrix of the best model (Model 1)

Key findings and insights

Model 1 (random forest) performed best overall. It is fast and have the highest accuracy and ROC score.

Next steps

For the next step it is recommended to build a neural network model with other parameters to increase the accuracy, or use another deep learning model.

Deep learning: a classification project to classify and predict costumer churn using Python, Jupyter Notebook, and Visual Studio Code. Prior to the machine learning process, data cleaning, data wrangling, and exploration data analysis were carried out. The best fitting model is the random forest model, it is more accurate and faster to compute compared to the neural network model.
Avatar of the user.
Please login to comment.

Published: Nov 17th 2022
40
4
0

Tools

python
Python

Share