Data Visualization with R

Avatar of Charee.
Avatar of Charee.

Data Visualization with R

International Marketing and Sales Intern
New Taipei City, Taiwan

Data Preparation


## to prepare data

df <- tibble(Fraud)

df$type <- factor(df$type)

df$isFraud <- factor(df$isFraud)

df$isFlaggedFraud <- factor(df$isFlaggedFraud)


## to create transaction types ratio

x_pie <- table(df$type)

y_pie <- unique(df$type)

per_pie <- round(x_pie/sum(x_pie) * 100, 2)

pie(x_pie, 

  labels = paste(y_pie, per_pie, "%", sep = " "), 

  main = "Transaction Type")

After transaction observation, I want to find the popular transactions that customers will likely to do by the bank’s dataset. I decided to code a pie chart within R Studio and found that customers are using transfer and debit transactions for 68.98% in the similar ratios. For the third popularity, customers are using payment services via the bank for 21.99% from all of transactions. In my opinion, the spread of contactless payment and technological improvement affected decreasing in traditional withdrawal transactions. Hence, customers are using digital transfer and payment instead.


## to create processed amount per transaction histogram plot

# to clean data and calculate mean and standard deviation

amount <- df %>%

 filter(oldbalanceOrg != 0, newbalanceOrig != 0,

     oldbalanceDest != 0, newbalanceDest != 0) %>%

 mutate(amounts = abs(oldbalanceOrg - newbalanceOrig)) %>%

 filter(amounts >= 1000 & !is.na(amounts))


m_amount <- round(mean(amount$amounts), 2)

sd_amount <- round(sd(amount$amounts), 2)


# to generate random sample

set.seed(14)

random_amount <- tibble(V1 = as.numeric(rnorm(

 n = 1338561, 

 mean = m_amount, sd = sd_amount))) %>%

 filter(V1 > 1000)


# to calculate quantile ranges for the random sample

hist_list <- list()

for (name in colnames(random_amount)) {

 print(paste("Calculating range for:", name))

 hist_list[[name]] <- quantile(random_amount[[name]], 

                probs = c(0, 0.2, 0.4, 0.6, 0.8, 1))

}


# to calculate mode value

mode_amount <- round(density(random_amount$V1)$x[which.max(density(random_amount$V1)$y)], 2)


# to plot histogram

ggplot(random_amount, aes(x = V1)) +

 geom_histogram(aes(fill = cut(V1, breaks = hist_list$V1))) +

 ggtitle("Distribution of Processed Amounts per Transaction") +

 xlab("Amount") +

 ylab("Count") +

 scale_fill_manual(

  values = c("#F8766D", "#F9A41B", "#D2D611", "#8FD944", "#00BFC4"),

  labels = c(

   "1k - 81.7k", "81.7k - 140k", "140k - 195k",

   "195k - 263k", "263k - 726k"

  )

 ) +

 scale_x_continuous(labels = comma_format()) +

 theme(axis.text.x = element_text(size = 8)) +

 geom_vline(xintercept = mode_amount, color = "blue", linetype = "dashed", size = 1) +

 labs(fill = "Amount") +

 annotate("text", x = mode_amount, y = 0, label = paste0("Mode: ", comma(mode_amount)))

     When I knew the type of transactions that customers mostly did with the bank services, I want to know how much money that was processed and the most popular amounts that were operated. This data set has six million transactions which is too much for my laptop capacity, I decided to filter only transactions greater than $1,000 only. After approximately getting around one million and three hundred thousand transactions, I randomly sampled by a normal distribution method to distribute equal opportunities that transactions will happen. Finally, I created a quantile range for the processed transactions to specify suitable amounts in the histogram chart and found that most customers will make transactions around $81,700 to $195,000 for 40% with the most popular amount being $154,956.

Instead of creating visualization by the external program, I proceeded visualization within R programming.
Avatar of the user.
Please login to comment.

Published: Jul 7th 2023
39
3
0

Share