Profile 02 00@2x 71843ef6a0df47d6255a9c0436c409dcd5cd81f6514c51a6b2a93339d82bbff6

linsam

data engineer、data scientist、backend

 • 0972724528 •  台灣  •  [email protected]

3~4 years experience with data engineer. (distributed queue system, database, web crawling, CI/CD, cloud) 

2 years experience with data science. (data analysis, machine learning and deep learning)

Work Experience


SinoPac Holdings -  Software Engineer(Python),Nov. 2019 年 11 月 - now

• Develop python Api (shioaji) for stock/option/future place orde and account. 

• Develop C# Api (shioaji) for stock/option/future place orde and account, and setup CI/CD with GitHub actions.

• Deploy test system for simulate trading by docker swarm.

• Collecting distributed system Log by elk, grafana and prometheus. 1GB log data/daily.

• Monitor distributed system and alert chatbot.

• Develop a transaction-by-trade and odd lot trading API.

Openup Speaker ( FinMind ) - 2019-12-01

Tripresso - Data Engineer,Oct. 2018 年 10 月 - Nov. 2019 年 10月

• Analysis travel data and build a machine learning model. Estimating increase 3% orders (revenue). 

• Maintain and develop an ETL distributed queuing system with 20 machines

• Optimize the ETL system reduced more than 50% execution time. 

• Develop new product crawler let product volume increase 1.5%. 

• Making analysis BI charts provide for other departments.

Mandatory Military Service,Oct. 2017 - Oct. 2018

NDHU - RA,Mar. 2016 - Aug. 2017

Analysing G7 financial data. Model validation and parameter estimation by regression models ( SUR, MLE, Bootstrapping ). And comparing single equation estimators and confidence interval with system equation.

NDHU - TA,Sep. 2015 - Jul. 2017

Calculus, Linear Algebra, Statistics.

Projects


FinMind Open data Api


Open source financial data, more than 50 dataset, provide Api. 

Automatic update daily by docker swarm, distributed queue system rabbitmq and celery ( 8 cloud machines ). 

1300 stars on github.

Total more than 1 billion data, 10 million streaming data per day.

Architecture diagram.



Bosch Production Line Performance - Kaggle Post-competition analysis, top 6% rank.

Highly imbalance data, ratio is 1000 : 1, 10 GB dataset size. And the data is 50% missing value. More than 4000 variables, but I build models by only 50 features.


Rossmann Store Sales - Kaggle 

Post-competition analysis, top 10% rank.

Time series problem. Building models predict sales after 48 days.


Grupo Bimbo Inventory Demand - Kaggle

Post-competition analysis, top 8% rank. 

Time series problem, eighty millions data size. Building models predict inventory demand after 2 weeks.


Instacart Market Basket Analysis - Kaggle

Real competition, top 25% rank. 

Predicting which products will an consumer purchase again.



 Verification code to text

Create python package of Taiwan Train Verification Code to text.

The model is made by keras-CNN.

Skills


Distributed Queue System

1. Rabbitmq & Celery & Flower. 

2. 8 nodes ( Cloud ) distributed queue system for web crawling. 

3. deploy by docker.


Database

1. MySQL ( RDBMS ). 

2. Redis ( NoSQL ). 

3. Dolphindb ( TSDB ).


Web Crawling

1. Python - request, BeautifulSoup, lxml, selenium. 

2. Auto recognition captcha code by CNN model.



CI/CD

1. Create automated tests and automated deploy for the FinMind team. 

2. Using gitlab runner. 

3. CD for auto publish python package. 

4. CD for auto update and deploy new version service.


Log Collect & Monitor

1. Distributed system log collect by elk.  

2. Prometheus and Grafana. Monitor user usage, request latency, request count 

3. Monitor by telegram bot. 



data pipeline

1. Design data pipeline for crawler, backend and analysis by airflow.

Machine Learning

xgboost, random forest, svm. statistics - ols, lasso.


Deep Learning

CNN, LSTM.



Data Mining

Python - numpy, pandas, sklearn. 

R - parallel, dplyr, data.table, mice.


WEB

1. https://finmindtrade.com/ 

2. nginx

3. frontend - vue 

4. backend - python 


Stress Test

1. apache benchmark.
2. Upper bound of FinMind api is 8000/minute request.


Education

National Dong Hwa University, Master of Science,  Sep. 2017.

Major : Mathematics and Statistics.

Tamkang University. Bachelor of Science, Sep. 2015.

Major : Mathematics

Languages


R, Python. Basic in English and proficient in Chinese.

Powered by CakeResumePowered by CakeResume