Avatar of 陳慶全.
陳慶全
Senior Data Engineer
ProfileResume
Posts
0Connections
Print
Avatar of the user.

陳慶全

Senior Data Engineer
* Data engineer and data scientist with over six years of experience. * Proven success in processing big volume of data (6TB per day) in Spark in Scala and MPI in R and Python. * Proven success in developing a machine learning model with Spark in Scala on 30 billion of records for IoT device recognition.
Logo of the organization.
Microsoft
Logo of the organization.
National Cheng Kung University,
New Taipei City, 台灣

Professional Background

  • Current status
    Employed
  • Profession
    Data Engineer
    Data Scientist
    Big Data Engineer
  • Fields
    Information Services
    Big Data
    Artificial Intelligence / Machine Learning
  • Work experience
    4-6 years (4-6 years relevant)
  • Management
    I've had experience in managing 1-5 people
  • Skills
    R
    Python
    C++
    Matlab
    Shell Script
    machine learning
    Deep Learning
    Data Analysis
    Data Mining
    Data Science
    Data Cleaning
    apache hive
    Apache Spark
    hadoop ecosystem
    Oracle
    MySQL
    SQL
    PowerPoint
    Statistics
    AWS
    Docker
    Bash
    Scala
    Azure
  • Languages
    Chinese
    Native or Bilingual
    English
    Fluent
    Japanese
    Fluent
  • Highest level of education
    Master

Job search preferences

  • Desired job type
    Full-time
    Interested in working remotely
  • Desired positions
    資料科學家、資料工程師、資料分析師
  • Desired work locations
    Taipei, Taiwan
    Japan
    United States
    Canada
    United Kingdom
    Netherlands
    Germany
    Switzerland
  • Freelance
    Non-freelancer

Work Experience

Logo of the organization.

Senior Data Engineer

Microsoft
Full-time
Jan 2021 - Present
New Taipei City, Taiwan
** Reliability Data System – Data Engineer • Process 1B records of data per day from data centers to provide data views for reliability engineers. • Lead 2 interns to complete data pipelines to visualize data for reliability engineers. ** Quality Management System – Data Engineer • Increased correctness rate of server components by 120% by leading data collection projects to get aligned with the data in internal databases. • Reduced runtime of data pipelines by 80% via replacing Hive with Spark. In the same time, the cost is reduced by 60% with transiting from Hadoop cluster to serverless Spark cluster. • Lead 9 Indian contractor to complete service migration to meet Microsoft compliance.
Logo of the organization.

Senior Data Scientist

Trend Micro Inc.
Full-time
Jan 2019 - Jan 2021
2 yrs 1 mo
Taipei City, Taiwan
** Home Network Security – Data Engineer • Reduced 90% time of reports from 1B security events every day. This helps marketing and sales people in Japan, Singapore and Australlia to find opportunities to improve business. • Visualized the relationship between security events for thread experts with word2vec and t-SNE. ** Network Behavior Analysis Project – Data Scientist • Developed a machine learning model to recognize IoT devices based on 30 billion records of netflows via Spark in Scala and Python. • Reached a 90% accuracy rate in identifying periodic network behaviors of IoT devices with a statistical model.
Logo of the organization.

Senior Data Engineer an Data Scientist

TSMC
Full-time
Jul 2016 - Jan 2019
2 yrs 7 mos
Taichung City, Taiwan
** Yield Improvement Project – Data Engineer and Data Scientist • Processed the big volume of data (6TB per day) to maintain a data warehouse for machine learning projects. • Reduced the out-of-control rate by 30% via a statistical model. • Reduced scrapping rate by 80% with homemade anomaly detection algorithms. • Reduced 80% time to find key factors of yield rates via data visualization and statistics ** Big Data Solutions – Data Engineer • Digest 6TB data per day by building an on-premise big data solution via Scala, Spark and Hive. • Reduced 95% implementation time of machine learning algorithms via R, MPI, Hive and Spark. ** Weekly Productivity Improvement Program – Leader • Developed R packages to reduce reinventing the wheels and increase productivity. • Taught writing clean and performant codes to data scientists and data engineers. • Organized study groups to share knowledge of machine learning and statistics with colleagues.
Logo of the organization.

Full-time Research Assistant

Academia Sinica
Full-time
Sep 2015 - Jun 2016
10 mos
Taipei City, Taiwan
** Main role • Decreased data processing time by 80% via R and MongoDB to process millions of records of data per day. • Got a 40% lowered RMSE in imputing missing values with home-made machine learning than other methods.

Education

Logo of the organization.
Master’s Degree
Statistics
2012 - 2014
4/4 GPA
Description
== Achievements == • Completed a master’s thesis entitled “A Classification Approach Based on Density Ratio Estimation with Subspace Projection.” Advisor: Ray-Bing Chen. • Earned a grade of 95% in my statistical methods, generalized linear models, and statistical data mining classes, and 92% in my linear models class. I am thus confident with building models and inferences from models. • Completed an advanced probability theory class designed for Ph. D. students.
Logo of the organization.
Bachelor’s Degree
Economics and Statistics
2008 - 2012
3.5/4 GPA
Description
With an advanced plan and hard work, I earned 175 credits for 2 majors within 4 years.