Stephen Yang

Data Geek. Two-year experience with data science/engineering.

Work Experience

Retail Business Services LLC(Ahold Delhaize), Data Engineering Co-op                       Jun 2023 ~ Now

  • Improving search engines performance through MDM.
    Validation on the pipelines to ensure data completeness in the downstream tasks. 
  • Monitoring the changes on recommended items from cross-sell model. Building framework of data drifting detection, starting from data retrieval, modeling to statistical/hypothetical testing applied on inference data.
  • Development of Intake process form app to Integrate ticketing requests from different teams.
Skillset : Databricks/Azure Data Factory/Power Automate/Power BI/ETL pipeline/Spark

Taipei Fubon Bank, Data Scientist(FinTech)                                                             Jan 2021 ~ Sep 2022

AI Early Warning Mechanism
  • In collaboration with Taipei City Police Department and Criminal Investigation Bureau, aimed at combating surging fraud activities during the pandemic which involved multiple channels(ATM, Online, and Mobile banking). 
  • Development of ML model to detect potential AML activities and to further succeed in fraud prevention. 
  • Replaced rule-based triggering with ML-based application in criminal investigation; eliminated tens of thousands of false alarms in a single month, optimized precision up to 80%, and significantly cut operational costs. 
  • Promoted ‘intelligent anti-fraud ecosystem’ with Taipei City Government and was reported in Fubon Financial news: link 
ETL PipelineBanking Business Analysis
  • Built analytics platform managing client, finance data collected from upstreams and third parties sources(EC, Telecom) Built end-to-end pipelines for reporting.
Skillset : PySpark/Hadoop/Scikit-Learn/Flask API/Beautiful Soup/ODS/Hive/OBIEE/Tableau/GCP(GCS, Dataflow, VMs, Vertex AI)

Apple Daily, Data Analyst                                                                                          Apr 2020 ~ Aug 2020

  • Digital tacking with GTM as sources for building ETL pipeline on GCP. 
  • BI reporting for assessing KPIs of news articles. 
  • Subscription analysis, churning prediction, and text mining
  • Skillset : BigQuery/Data Studio/GA360/Google Analytics/GTM/NLTK/Scikit-Learn/LDA

Education

Boston University, Master of Science in Software Development(MSSD)                         2023 ~ Present

Master program for developing essential skillset as software developer. Algorithm, Programming Language and System.

Arizona State University, AI and Machine Learning MasterTrack™                       Feb 2021 ~ Dec 2021

Online Graduate Certificate Courses: Statistics, Machine Learning,  Artificial Intelligence 

Institute for Information Industry, Big Data Bootcamp                                          Sep 2019 ~ Feb 2020

Training in big data and development skills and building end-to-end data projects with Python, MySQL and Hadoop. 

Tamkang University, Bachelor of Business Administration(BBA)                          Sep 2013 ~ Jun 2017

Project & Competition

  T-Brain Machine Learning Competition 2022 - Anti Money Laundering

Leveraging XGBoost(GBDT) combined with my ML fraud detection experience to succeed in this competition.

Hosted by : E.SUN Bank & TrendMicro (Rank : No.6)   

  GoDaddy - Microbusiness Density Forecasting Competition

The project is to demonstrate better performance on time series prediction with Optuna framework.

   BIRDS 475 SPECIES - IMAGE CLASSIFICATION

Serving Image model as Streamlit web app on GKE. It allows the user to enter the image url link and display output.

Skillset: Streamlit, GKE, CloudBuild, Python, Git, Docker

  Spotify ETL Pipeline

Building ETL for Spotify API and automate the workflows through Airflow. Analytics tasks executed with DBT on SnowFlake.

Skillset: Airflow, DBT, Snowflake, Python, Git, Astronomer, Docker

Skills

Machine Language:Python, PySpark, Tensorflow, PyTorch, SQL

Dashboard:Power BI, Tableau

Cloud Service : GCP, Azure, Cloudera

Development Tool: Docker, Jenkins, Git, Airflow, DBT

Certificate

Project Management Professional(PMP)

AI and Machine Learning MasterTrack® Certificate(ASU)

Google Cloud Certified - Professional Data Engineer