Jun 2020 - Present
• Focus on applying data science, machine learning, and business intelligence to work
• Build data pipeline for requirements via Apache Airflow
• Data ETL
- Deal with > 10 million records/GB level of daily server logs and market data
• AWS Services
- S3, Athena, SageMaker, Neptune, Kinesis, Personalize, Rekognition, and so on
• Data Visualization/Analysis/Report
- Apache Superset, Tableau, Google Analytics
- ad-hoc request with Jinja template
• Data Warehouse, OLAP - Apache Kylin
• Machine Learning
- MLOps - MLflow
- SOTA paper survey, implementation and deployment
- Domain: Recommendation system, Graph neural network, Computer vision
• Build 3 data products from scratch
(1) Content retrieval system
- Deep learning model to extract content information from templates, automatically tagging to template
- Huge improvement of template exploration and user experience after conducting statistic test (Improved by over 30%)
(2) Personalized recommendation system
- Launch on Promeo - App launched by CyberLink
- A from-scratch MLOps based recommendation system that automatically and steadily train, choose, and deploy model then inference
- Pass real-world performance test via A/B test that have a conversion rate enhancement by over 35%
(3) Graph neural network based Customer Data system
- Construct GNN-based Persona and conduct customer segmentation
- Model users and their interaction and relationship into graph structure
- Explore the implicit preference of customers with the help of GNN