linsam

data engineer、backend engineer

• 0972724528 • 台灣 • [email protected]

5~6 years experience with data engineer and soft engineer. (Distributed Queue System, Database, Web Crawling, RESTful API, ETL, Docker, CICD, GCP, K8S, Airflow ...etc.)

1~2 years experience with data science. (data analysis, machine learning and deep learning)

Work Experience

17 Live - Senior Data Engineer (IC5), May. 2021 - now

• Refactor ETL, create a airflow project by Cloud Composer to transfer ETL tools from digdag to airflow and transfer ETL develop method from shell script to python.

• Maintenance BigQuery more than 100 tables.

• Create pipelines from mysql and mongo to bigquery.

• Create a good development culture, including the introduction of CICD, dev-stage-uat-master, release news, unit tests and test coverage.

• Using Airflow unified scheduler job, like cloud function scheduler, BQ scheduler, crontab, and ML model by R or Python ...etc.

• Reduce Data Team 25% cost.

• Create Data Team's first real-time ETL system via GKE, Pub/Sub and Memorystore for sending push notifications to users.

• Create Data Team's first API via GKE for ML model, include achieve graceful shutdown, and run stress test via ApacheBench, and setup auto-scaling by hpa. 95% latency is under 200ms and RPS is over 200.

• Create a Tagging System for tracking groups of users.

• Create a BigQuery Resource Monitor to monitor users BQ slot and query count usage.

• Create document culture by confluence.

• The finalists of Break the Norm awards on 2021-Q3 and 2021-Q4.

• Assist in interview more than 10 new data engineer.

• Mentor junior data engineers to be more effective individual contributors.

• Apply the data team's models to the company's APP. (automatically send push notifications and in-app messages)

• Automatically update recommend streamer list via data team's models to the company's APP.

SinoPac Holdings - Software Engineer(Python), Nov. 2019 - May. 2021

• Develop python Api (shioaji) for stock/option/future place orde and account.

• Develop C# Api (shioaji) for stock/option/future place orde and account, and setup CI/CD with GitHub actions.

• Deploy test system for simulate trading by docker swarm.

• Collecting distributed system Log by elk, grafana and prometheus. 13GB log data/daily.

• Monitor distributed system and alert chatbot.

• Develop a transaction-by-trade and odd lot trading API.

Open Up Summit Speaker ( FinMind ) - 2019-12-01

Tripresso - Data Engineer, Oct. 2018 - Nov. 2019

• Analysis travel data and build a machine learning model. Estimating increase 3% orders (revenue).

• Maintain and develop an ETL distributed queuing system with 20 machines.

• Optimize the ETL system reduced more than 50% execution time.

• Develop new product crawler let product volume increase 1.5%.

• Making analysis BI charts provide for other departments.

Mandatory Military Service，Oct. 2017 - Oct. 2018

NDHU - RA, Mar. 2016 - Aug. 2017

Analysing G7 financial data. Model validation and parameter estimation by regression models ( SUR, MLE, Bootstrapping ). And comparing single equation estimators and confidence interval with system equation.

NDHU - TA, Sep. 2015 - Jul. 2017

Calculus, Linear Algebra, Statistics.

Projects

FinMind Open data Api

Open source financial data, more than 50 dataset, provide Api.

More than 2,000 people registered.

2,000 stars on github.

Automatic update daily by docker swarm, distributed queue system rabbitmq and celery ( 10 cloud machines ).

Total more than 1 billion data, 10 million streaming data per day.

Architecture diagram.

Bosch Production Line Performance - Kaggle Post-competition analysis, top 6% rank.

Highly imbalance data, ratio is 1000 : 1, 10 GB dataset size. And the data is 50% missing value. More than 4000 variables, but I build models by only 50 features.

Rossmann Store Sales - Kaggle

Post-competition analysis, top 10% rank.

Time series problem. Building models predict sales after 48 days.

Grupo Bimbo Inventory Demand - Kaggle

Post-competition analysis, top 8% rank.

Time series problem, eighty millions data size. Building models predict inventory demand after 2 weeks.

Instacart Market Basket Analysis - Kaggle

Real competition, top 25% rank.

Predicting which products will an consumer purchase again.

Verification code to text

Create python package of Taiwan Train Verification Code to text.

The model is made by keras-CNN.

Skills

Distributed Queue System

1. Rabbitmq & Celery & Flower.

2. 8 nodes ( Cloud ) distributed queue system for web crawling.

3. Deploy by Docker and GKE.

4. Graceful Shutdown.

Database

1. MySQL ( RDBMS ).

2. Redis ( NoSQL ).

3. Dolphindb ( TSDB ).

GCP

1. Pub/Sub.

2. GKE ( K8S ).

3. GCE.

4. BQ.

5. Composer.

6. MemoryStore.

CI/CD

1. Create automated tests and automated deploy for the FinMind team.

2. Using gitlab runner.

3. CD for auto publish python package.

4. CD for auto update and deploy new version service.

Log Collect & Monitor

1. Distributed system log collect by elk.

2. Prometheus and Grafana. Monitor user usage, request latency, request count

3. Monitor by telegram bot and slackbot.

4. Monitor vm and container by Netdata and cadvisor.

data pipeline

1. Design data pipeline for crawler, backend and analysis by airflow.

2. Design more 200 ETL by airflow.

3. Build airflow by composer

4. Build a real-time pipeline for sending push notifications to users

Machine Learning

xgboost, random forest, svm. statistics - ols, lasso.

Web Crawling

1. Python - request, BeautifulSoup, lxml, selenium.

2. Auto recognition captcha code by CNN model.

Data Mining

Python - numpy, pandas, sklearn.

R - parallel, dplyr, data.table, mice.

WEB

1. https://finmindtrade.com/

2. nginx

3. frontend - vue

4. backend - python

5. traefik.

API

1. FastAPI.

2. Websocket.

3. Loading Balance.

4. Async.

5. Graceful Shutdown.

Stress Test

1. ApacheBench.

2. Upper bound of FinMind api is 8000/minute request.

Education

National Dong Hwa University, Master of Science, Sep. 2017.

Major : Mathematics and Statistics.

Tamkang University. Bachelor of Science, Sep. 2015.

Major : Mathematics

Languages

R, Python. Basic in English and proficient in Chinese.