Avatar of 陳慶全.
陳慶全
Senior Data Engineer
列印
Avatar of the user.

陳慶全

Senior Data Engineer
* Data engineer and data scientist with over six years of experience. * Proven success in processing big volume of data (6TB per day) in Spark in Scala and MPI in R and Python. * Proven success in developing a machine learning model with Spark in Scala on 30 billion of records for IoT device recognition.
Logo of the organization.
Microsoft
Logo of the organization.
National Cheng Kung University,
New Taipei City, 台灣

职场能力评价

专业背景

  • 目前状态
    就职中
  • 专业
    数据工程师
    数据科学家
    大数据开发人员
  • 产业
    资讯服务
    大数据
    人工智能 / 机器学习
  • 工作年资
    4 到 6 年 (4 到 6 年相关工作经验)
  • 管理经历
    我有管理 1~5 人的经验
  • 技能
    R
    Python
    C++
    Matlab
    Shell Script
    machine learning
    Deep Learning
    Data Analysis
    Data Mining
    Data Science
    Data Cleaning
    apache hive
    Apache Spark
    hadoop ecosystem
    Oracle
    MySQL
    SQL
    PowerPoint
    Statistics
    AWS
    Docker
    Bash
    Scala
    Azure
  • 语言能力
    Chinese
    母语或双语
    English
    进阶
    Japanese
    进阶
  • 最高学历
    硕士

求职偏好

  • 预期工作模式
    全职
    对远端工作有兴趣
  • 希望获得的职位
    資料科學家、資料工程師、資料分析師
  • 期望的工作地点
    Taipei, 台灣
    Japan
    USA
    Canada
    UK
    Netherlands
    Germany
    Switzerland
  • 接案服务
    不提供接案服务

工作经验

Logo of the organization.

Senior Data Engineer

Microsoft
全职
2021年1月 - 现在
台灣新北市
** Reliability Data System – Data Engineer • Process 1B records of data per day from data centers to provide data views for reliability engineers. • Lead 2 interns to complete data pipelines to visualize data for reliability engineers. ** Quality Management System – Data Engineer • Increased correctness rate of server components by 120% by leading data collection projects to get aligned with the data in internal databases. • Reduced runtime of data pipelines by 80% via replacing Hive with Spark. In the same time, the cost is reduced by 60% with transiting from Hadoop cluster to serverless Spark cluster. • Lead 9 Indian contractor to complete service migration to meet Microsoft compliance.
Logo of the organization.

Senior Data Scientist

2019年1月 - 2021年1月
2 年 1 个月
台灣台北市
** Home Network Security – Data Engineer • Reduced 90% time of reports from 1B security events every day. This helps marketing and sales people in Japan, Singapore and Australlia to find opportunities to improve business. • Visualized the relationship between security events for thread experts with word2vec and t-SNE. ** Network Behavior Analysis Project – Data Scientist • Developed a machine learning model to recognize IoT devices based on 30 billion records of netflows via Spark in Scala and Python. • Reached a 90% accuracy rate in identifying periodic network behaviors of IoT devices with a statistical model.
Logo of the organization.

Senior Data Engineer an Data Scientist

TSMC
全职
2016年7月 - 2019年1月
2 年 7 个月
台灣台中市
** Yield Improvement Project – Data Engineer and Data Scientist • Processed the big volume of data (6TB per day) to maintain a data warehouse for machine learning projects. • Reduced the out-of-control rate by 30% via a statistical model. • Reduced scrapping rate by 80% with homemade anomaly detection algorithms. • Reduced 80% time to find key factors of yield rates via data visualization and statistics ** Big Data Solutions – Data Engineer • Digest 6TB data per day by building an on-premise big data solution via Scala, Spark and Hive. • Reduced 95% implementation time of machine learning algorithms via R, MPI, Hive and Spark. ** Weekly Productivity Improvement Program – Leader • Developed R packages to reduce reinventing the wheels and increase productivity. • Taught writing clean and performant codes to data scientists and data engineers. • Organized study groups to share knowledge of machine learning and statistics with colleagues.
Logo of the organization.

Full-time Research Assistant

2015年9月 - 2016年6月
10 个月
台灣台北市
** Main role • Decreased data processing time by 80% via R and MongoDB to process millions of records of data per day. • Got a 40% lowered RMSE in imputing missing values with home-made machine learning than other methods.

学历

Logo of the organization.
Master’s Degree
Statistics
2012 - 2014
4/4 GPA
简介
== Achievements == • Completed a master’s thesis entitled “A Classification Approach Based on Density Ratio Estimation with Subspace Projection.” Advisor: Ray-Bing Chen. • Earned a grade of 95% in my statistical methods, generalized linear models, and statistical data mining classes, and 92% in my linear models class. I am thus confident with building models and inferences from models. • Completed an advanced probability theory class designed for Ph. D. students.
Logo of the organization.
Bachelor’s Degree
Economics and Statistics
2008 - 2012
3.5/4 GPA
简介
With an advanced plan and hard work, I earned 175 credits for 2 majors within 4 years.