** Yield Improvement Project – Data Engineer and Data Scientist
• Processed the big volume of data (6TB per day) to maintain a data warehouse for machine learning projects.
• Reduced the out-of-control rate by 30% via a statistical model.
• Reduced scrapping rate by 80% with homemade anomaly detection algorithms.
• Reduced 80% time to find key factors of yield rates via data visualization and statistics
** Big Data Solutions – Data Engineer
• Digest 6TB data per day by building an on-premise big data solution via Scala, Spark and Hive.
• Reduced 95% implementation time of machine learning algorithms via R, MPI, Hive and Spark.
** Weekly Productivity Improvement Program – Leader
• Developed R packages to reduce reinventing the wheels and increase productivity.
• Taught writing clean and performant codes to data scientists and data engineers.
• Organized study groups to share knowledge of machine learning and statistics with colleagues.