Yen-Ting Liu

我具有5年python資料分析,熟悉以Docker搭配nginx, redis部屬api及系統於GCP上。熟悉Airflow程式及報表自動化分析流程,並有Hadoop,Elasticsearch群集管理實務、pyspark數據ETL經驗。我喜歡學習新技術,並追求以更高效率進行資料處理流程。

  Santa Clara, CA, USA          [email protected]

工作經歷

Data Engineer  •  Tesla

• Implemented Airflow ETL pipelines using Docker and integrated various databases including SQL Server, MSSQL, Vertica,
MongoDB, enhancing reliability, accessibility of the pipelines, and improving data processing efficiency by 10x
• Designed a data alert system using Python that monitors hundreds of ETL processes, and provides updates on the latest data status
every minute, and delivers status alerts to communication services or email, resolving stale data and ETL failure issues.
• Developed Python tools such as encrypting and decrypting sensitive data, and real-time operational reporting systems using Kafka
and MongoDB, enabling stakeholders to access up-to-date information for performance monitoring and reporting

一月 2023 - 五月 2023

Data engineer  •  Vpon

Designed and developed ETL pipelines
• Implemented and maintained ETL Pipelines which integrated with Jenkins on cloud services (GCP and AWS), provided by analysts
with reliable data, and saved 50% effort (Groovy, Dataproc, Spark, BigQuery, EMR, Hive, Jenkins)
• Designed ETL pipeline framework and implemented it to cloud service and held user training sessions for engineers and analysts. This project was reduced by 40% maintenance efforts and increased by 60% deployment efficiency by leveraging Airflow and
Kubernetes (Gitlab, Airflow, Python, K8s)
Developed and deployed API to retrieve data in the cloud environment
• Developed an API to retrieve data that included geo-location data from BigQuery and deployed it on the GCP environment.
The API saved 80% of the time on fetching data (Cloud Run, IAM, BigQuery)

十月 2019 - 七月 2021

Data engineer  •  富盈數據

Maintained distributed system and database
• Constructed and managed the Hadoop ecosystem with Ambari. Built ETL pipeline to query multi-source database which
processing more than three terabytes (TB) provided 90% of the analysis needs (Hive, HBase, Python, ELK, MySQL)
• Established data collection and analysis workflow, saving Data scientists’ 30% of the time to analyze and build machine
learning models with collected data (Elasticsearch, PySpark, Airflow) Constructed backend system and API
• Researched webpage user preference and behavior, and modified advertising performance evaluation system to enable
precision marketing, increasing the accuracy by 300% for advertising targeting
• Constructed articles to classify API and embedded a machine learning model (linear regression, random forest, XGBoost) to
categorize the articles. The tool has been implemented as the product and processed 90% of the articles every day (Flask)
• Upgraded an advertising API and deployed it on cloud service (GCP). Increased the total monthly revenue by 33% after
implementing the new API (Python, Docker, Nginx, Celery, Redis, Load-balance system, MySQL, HBase)

二月 2019 - 九月 2019

Research Assistant  •  Academia Sinica 中央研究院

Construct data pipeline and data analysis
• Developed bioinformatics pipeline, which saved 80% effort for non-technical scientists to analyze and visualize genome
sequencing. Published the research paper and the software in the Frontiers journal as the first author (Python, R, Linux)

十月 2017 - 四月 2018

學歷

2021 - 2023

University of Texas at Dallas

Information Technology and Management

2014 - 2016

台灣大學

生物材料

資格認證


AWS Certified Cloud Practitioner

AWS Training and Certification

十一月 2025 到期