Profile 02 00@2x 71843ef6a0df47d6255a9c0436c409dcd5cd81f6514c51a6b2a93339d82bbff6


data engineer、data scientist • 0972724528 •  台灣  •  [email protected]

Experience with data mining, machine learning, and web crawling. Hopes to focus more on data science and data engineer in future career.

Work Experience

Tripresso - Data Engineer,Oct. 2018 年 10 月 - now

• Analysis travel data and build a machine learning model. Estimating increase 4% orders (revenue). 

• Maintain and develop an ETL distributed queuing system with 20 machines. 

• Optimize the ETL system reduced more than 50% execution time. 

• Develop new product crawler let product volume increase 1.5%. 

• Making analysis charts provide for other departments.

Mandatory Military Service,Oct. 2017 - Oct. 2018

NDHU - RA,Mar. 2016 - Aug. 2017

Analysing G7 financial data. Model validation and parameter estimation by regression models ( SUR, MLE, Bootstrapping ). And comparing single equation estimators and confidence interval with system equation.

NDHU - TA,Sep. 2015 - Jul. 2017

Calculus, Linear Algebra, Statistics.


Open Source of PTT data

Automatic crawling PTT data daily, and providing open data, more than six millions article, in MySQL.

Bosch Production Line Performance - Kaggle 

Post-competition analysis, top 6% rank.

Highly imbalance data, ratio is 1000 : 1, 10 GB dataset size. 

And the data is 50% missing value. 

More than 4000 variables, but I build models by only 50 features.

Rossmann Store Sales - Kaggle 

Post-competition analysis, top 10% rank.

Time series problem. Building models predict sales after 48 days.

Grupo Bimbo Inventory Demand - Kaggle

Post-competition analysis, top 8% rank. 

Time series problem, eighty millions data size. Building models predict inventory demand after 2 weeks.

Instacart Market Basket Analysis - Kaggle

Real competition, top 25% rank. 

Predicting which products will an consumer purchase again.

FinMind Python package

More than 40 dataset.
Automatic update daily by distributed queue system rabbitmq and celery ( 8 cloud machines ).


Data Mining

Python - numpy, pandas, sklearn, multiprocessing, joblib. 

R - parallel, dplyr, data.table, mice.

Machine Learning

Python - xgboost-gpu. 

R - xgboost, svm, random forest, knn.

Deep Learning

Python - kears-CNN.

Statistical Model


Web Crawling 

1. Python - request, BeautifulSoup, lxml, selenium.
2. Auto send email monitor crawler status.

Rabbitmq & Celery

1. Build works on 8 Linodes ( Cloud ) Distributed queue system for Web Crawling.
2. install worker documents
3. supervisor 
4. Flower

Create Python Package 

1. FinMind 249 stars
2. PTTData 146 stars

WEB and Visualization
( by django, flask, nginx, uwsgi )


1. Load FinMind data by api.
2. Using python and flask develop api.



National Dong Hwa University, Master of Science,  Sep. 2017.

Major : Mathematics and Statistics.

Tamkang University. Bachelor of Science, Sep. 2015.

Major : Mathematics


R, Python. Basic in English and proficient in Chinese.

Powered by CakeResumePowered by CakeResume
Powered by CakeResumePowered by CakeResume