Avatar of 陳慶全.

陳慶全

Senior Data Engineer
* Data engineer and data scientist with over six years of experience. * Proven success in processing big volume of data (6TB per day) in Spark in Scala and MPI in R and Python. * Proven success in developing a machine learning model with Spark in Scala on 30 billion of records for IoT device recognition.
Logo of Microsoft.
Microsoft
Logo of National Cheng Kung University,.
National Cheng Kung University,
New Taipei City, 台灣

Skills

R
Python
C++
Matlab
Shell Script
machine learning
Deep Learning
Data Analysis
Data Mining
Data Science
Data Cleaning
apache hive
Apache Spark
hadoop ecosystem
Oracle
MySQL
SQL
PowerPoint
Statistics
AWS
Docker
Bash
Scala
Azure

Languages

Chinese
Native or Bilingual
English
Fluent
Japanese
Fluent

Work experiences

Logo of Microsoft.

Senior Data Engineer

Microsoft
Full-time

Jan 2021 ~ Present
New Taipei City, Taiwan
** Reliability Data System – Data Engineer • Process 1B records of data per day from data centers to provide data views for reliability engineers. • Lead 2 interns to complete data pipelines to visualize data for reliability engineers. ** Quality Management System – Data Engineer • Increased correctness rate of server components by 120% by leading data collection projects to get aligned with the data in internal databases. • Reduced runtime of data pipelines by 80% via replacing Hive with Spark. In the same time, the cost is reduced by 60% with transiting from Hadoop cluster to serverless Spark cluster. • Lead 9 Indian contractor to complete service migration to meet Microsoft compliance.
Logo of Trend Micro Inc..

Senior Data Scientist

Trend Micro Inc.
Full-time

Jan 2019 ~ Jan 2021
2 yrs 1 mo
Taipei City, Taiwan
** Home Network Security – Data Engineer • Reduced 90% time of reports from 1B security events every day. This helps marketing and sales people in Japan, Singapore and Australlia to find opportunities to improve business. • Visualized the relationship between security events for thread experts with word2vec and t-SNE. ** Network Behavior Analysis Project – Data Scientist • Developed a machine learning model to recognize IoT devices based on 30 billion records of netflows via Spark in Scala and Python. • Reached a 90% accuracy rate in identifying periodic network behaviors of IoT devices with a statistical model.
Logo of TSMC.

Senior Data Engineer an Data Scientist

TSMC
Full-time

Jul 2016 ~ Jan 2019
2 yrs 7 mos
Taichung City, Taiwan
** Yield Improvement Project – Data Engineer and Data Scientist • Processed the big volume of data (6TB per day) to maintain a data warehouse for machine learning projects. • Reduced the out-of-control rate by 30% via a statistical model. • Reduced scrapping rate by 80% with homemade anomaly detection algorithms. • Reduced 80% time to find key factors of yield rates via data visualization and statistics ** Big Data Solutions – Data Engineer • Digest 6TB data per day by building an on-premise big data solution via Scala, Spark and Hive. • Reduced 95% implementation time of machine learning algorithms via R, MPI, Hive and Spark. ** Weekly Productivity Improvement Program – Leader • Developed R packages to reduce reinventing the wheels and increase productivity. • Taught writing clean and performant codes to data scientists and data engineers. • Organized study groups to share knowledge of machine learning and statistics with colleagues.
Logo of Academia Sinica.

Full-time Research Assistant

Academia Sinica
Full-time

Sep 2015 ~ Jun 2016
10 mos
Taipei City, Taiwan
** Main role • Decreased data processing time by 80% via R and MongoDB to process millions of records of data per day. • Got a 40% lowered RMSE in imputing missing values with home-made machine learning than other methods.

Educations

Logo of National Cheng Kung University,.

National Cheng Kung University,

Master’s Degree
Statistics

2012 - 2014
4/4 GPA
Description
== Achievements == • Completed a master’s thesis entitled “A Classification Approach Based on Density Ratio Estimation with Subspace Projection.” Advisor: Ray-Bing Chen. • Earned a grade of 95% in my statistical methods, generalized linear models, and statistical data mining classes, and 92% in my linear models class. I am thus confident with building models and inferences from models. • Completed an advanced probability theory class designed for Ph. D. students.
Logo of National Cheng Kung University.

National Cheng Kung University

Bachelor’s Degree
Economics and Statistics

2008 - 2012
3.5/4 GPA
Description
With an advanced plan and hard work, I earned 175 credits for 2 majors within 4 years.
Powered By CakeResume