Profile 02 00@2x 71843ef6a0df47d6255a9c0436c409dcd5cd81f6514c51a6b2a93339d82bbff6

linsam

data engineer、data scientist • 0972724528 •  台灣  •  [email protected]

Experience with data mining, machine learning, and web crawling. Hopes to focus more on data science and data engineer in future career.

Skills


Data Mining

Python - numpy, pandas, sklearn, multiprocessing, joblib. 

R - parallel, dplyr, data.table, mice.


Machine Learning

Python - xgboost-gpu. 

R - xgboost, svm, random forest, knn.


Deep Learning

Python - kears-CNN.


Statistical Model

R - GLM, GLMNET, NLS, SUR, MLE.


Web Crawling 

1. Python - request, BeautifulSoup, lxml, selenium.

2. Auto send email 監控爬蟲 status.

Rabbitmq & Celery

1. Build works on Linode ( 雲端 ) Distributed queue system for Web Crawling.
2. 安裝 worker 技術文件
3. supervisor 自動啟動
4. celery 設不同的 group for tasks,根據爬蟲效能做分流
5. Flower 監控
6. parallel-ssh 同時對多台 worker 下指令


Tor & Requests

1. 某些網站會擋 IP,用 celery concurrency 多線程,會被擋,因此使用 tor 換 IP。

Create Package 
1. Financial Mining
2. 寫 package 並使用 travis-ci.org 自動測試
3. PTT ( future )

data Visualization (future) 

1. Dash 


Package web document

FinMindDoc



data mining (future) 

Others

Execting deployment MySQL on ubuntu. Changing IP address to entity address by No-IP and installing SSL certificates by Let’s Encrypt. trello 管理開發流程、進度

.

Projects


Open Source of PTT data

100 stars on github.

Automatic crawling PTT data daily, and providing open data, more than six millions article, in MySQL.



Bosch Production Line Performance - Kaggle 

Post-competition analysis, top 6% rank.

Highly imbalance data, ratio is 1000 : 1, 10 GB dataset size. 

And the data is 50% missing value. 

More than 4000 variables, but I build models by only 50 features.


Rossmann Store Sales - Kaggle 

Post-competition analysis, top 10% rank.

Time series problem. Building models predict sales after 48 days.


Grupo Bimbo Inventory Demand - Kaggle

Post-competition analysis, top 8% rank. 

Time series problem, eighty millions data size. Building models predict inventory demand after 2 weeks.


Instacart Market Basket Analysis - Kaggle

Real competition, top 25% rank. 

Predicting which products will an consumer purchase again.



FB-ChatBot ( In development )

Automatic ordering Taiwan train tickets, and recognizing Taiwan train verification codes by CNN models.




Financial Mining ( Create Python Package )

Taiwan Stock Prices, Financial Statements, Stock Dividend, Institutional Investors buy and sell. 

G8 data includes Oil price, Exchange Rate, Central bank Interest Rate, Gold Price and Government Bonds. 

Automatic update daily.



Work Experience


NDHU - RA

Mar. 2016 - Aug. 2017 

Analysing G7 financial data. Model validation and parameter estimation by regression models ( SUR, MLE, Bootstrapping ). 

And comparing single equation estimators and confidence interval with system equation.


NDHU - TA

Sep. 2015 - Jul. 2017

Calculus, Linear Algebra, Statistics

Languages


R, Python. Basic in English and proficient in Chinese.

Powered by CakeResumePowered by CakeResume
Powered by CakeResumePowered by CakeResume