許立農 | Hsu, Li-Nung

Data Scientist、Data Engineer
Taipei
[email protected]

Education

National Chenchi University, MS, Statistics, 2015 – 2017

GPA : 3.84 / 4.0
Master Thesis: Entropy Based Feature Selection, Professor Pei-Ting, Chou

Objective: Build a similarity matrix based on Mutual Entropy under Hierarchical Clustering. Afterwards, select clustered features as the final selection.
Compare the model with other feature selection methods like RF, Lasso, F-score.

National Chen-Kung University, BS, Mathematics, 2011 – 2015

Skills

Programing

Python
Scala
R
MSSQL

Data-related Tools

Tensorflow (Keras)
PyTorch
Spark
Docker
Scikit-Learn
Pandas

Cloud Platform

Language

English: TOEFL 98 / 120

Work Experience

CTBC Bank, Model Development Department, Data Scientist

2021.12 – present

About the department:

Responsible for developing models related to bank recommendations and risks, including projects such as coupon recommendations, account opening marketing lists, and fraud detection.

Job responsibilities:

Throughout the entire project lifecycle, my primary responsibilities included model design, model training, end-to-end process development, feature design, performance tracking, and method research.

Fraud Alert Project

Objective:

Predicting potential fraudulent accounts based on transaction data, restricting transactions in advance to prevent harm.

Responsibilities/Achievements:

Development and deployment of credit card and financial features.
Managing the data flow process from receiving variables to model predictions, identifying risk factors, and updating alert lists.
Implemented Autoencoder + contrastive learning to achieve a 1.81% improvement in model effectiveness.

Coupon Recommendation

Objective:

Personalized coupon recommendations for mobile banking users to increase click-through rates and redemption rates.

Responsibilities/Achievements:

Utilized multi-task learning to simultaneously predict click-through behavior and coupon redemptions, resulting in a 14% increase in click-through rate and a 74% increase in redemption rate.
Created performance tracking reports to monitor online model performance and provide insights to Business Units.

Financial Product Recommendations

Objective:

Tailored financial product recommendations for mobile banking users to enhance click-through rates without compromising conversion rates.

Responsibilities/Achievements:

Applied multi-task learning to jointly learn click-through and conversion behaviors, fine-tuned model architecture, achieving a 90% outperformance against competitor models in online testing.

Marketing List for Digital Savings Accounts

Objective:

Optimized conversion rates for marketing lists related to digital savings accounts

Responsibilities/Achievements:

successfully raising conversion rates from 0.23% to 1.16%

Work Experience

CLICKFORCE, Data Engineer Supervisor, 2020.1 – 2021.11

About the company:

As a top domestic digital advertisement company, CLICKFORCE cooperates with over 900 web media and over 400 mobile media to build a huge advertising environment. CLICKFORCE considers data-driven solution as the core concept of the company, and dedicates to help advertisers to achieve their commercial goals.
At 2020, CLICKFORCE won 2 awards at Agency & Advertiser of the Year.
Successfully acquire the exclusive advertising agency qualification for Tokyo 2020 Olympics in Taiwan.

Job responsibilities:

Optimize ad performance from all aspects, including the system, target audience tags, etc.
Do researches for new ML model (recommender model, NLP model) or architecture which is suitable for our system.
Develop data-related products or projects.
Analyze data to help improve our system or inspect whether the demands from business side is doable.

Real-time AD Recommender System

Objective:

Building a real-time ad recommender system to upgrade our ad server and get better performance.

Responsibilities:

Figure out what kind of recommender system components that is suitable for our ad system.
Build a tower-like and feature-cross model refer to other famous recommender system model.
Responsible for system engineering, which includes data preprocessing, embedding generates, memory cache, cold start, model API, etc.

Interest Tags

Objective:

Build interest tags for ads to help ad optimizers choose their target audience.

Responsibilities:

Create the features from what articles they saw, what website they viewed, and what ads they interacted.
Deal with 20 million rows data and 120 million inference samples.
Build ML model to predict each user's behavior on certain ads.
Using Spark through AWS EMR to accelerate the speed of producing tags.

Achievements:

Raise CTR performance up to 200-300% of the original tags depends on different tags, and gain more impression while maintain better performance.
After accomplishing this project, we terminated the cost on purchasing interest tags from other company, and successfully turned the original cost into revenue by providing profitable data.

First Party Cookie Mapping

Objective:

Deal with the Google 3rd party Cookie issue, figure out a method to map numerous 1st party Cookies to a user.

Responsibility:

Transform this problem into a ML mission. Design the label of the data, figure out what feature we can get or produce and whether the feature is useful for the goal.
Apply XGboost on this mission.
Build a small test to prove this method works.

Achievement:

70% of precision.
One of the solution of our company while the cancelation of 3rd party Cookie happen.

Invoice Data Application

Objective:

Develop invoice data application.

Responsibility:

Responsible for fine-tuning BERT to predict category for each product.
Produce invoice data report to brands or business unit. It demonstrates the sales volume across different channel, what kind of products are frequently bought together, and also shows comparison of target brand to the other brands.

Achievements:

Produce an invoice data report product.
Produce invoice tags for ad system.

Other Experience

E.Sun AI 2020 Summer Competition, 2020.7 – 2020.8

Objective:

Extract names of money laundering suspects from an article.

Responsibilities:

Crawl the articles from different media, and parse them by using Selenium, Requests, and Beautiful Soup.
Construct 2-step model: First, identify whether the article is related to money laundering. Second, extract the suspects' names.
Build model serving API by Tensorflow Serving.
Build REST API for preprocessing request data and return the prediction.

Achievement:

23rd place among 409 teams.

Youtube Data-Driven Marketing System, Institute for Information Industry, 2019.8 – 2019.11

Objectives:

Use the title and the description of videos to automatically classify videos.
Use the title and the description of videos to identify whether a video is sponsored.
Give suggestions for Youtubers or companies who desire to sponsor in a video based on data analysis.

Responsibilities:

Apply Google API and write Python functions to get structured raw data.
Train word vectors using Gensim based on Wiki's open data.
Use the frequency of each sentence as a criteria to eliminate useless words.
Tune LSTM, Conv1D, BERT on the NLP mission.
Use EDA methods to see the insights of the data under different classes and different sponsored status.

Achievement:

71% accuracy in classifying video’s type.
89% accuracy in detecting sponsored content.

E.Sun Real Estate Price Prediction Competition, 2019.7 – 2019.8

Objective:

Use the real estate training data to build a model and predict the real estate price within 10% residual.

Responsibilities:

Apply XGBoost, LGBM and other ML models to train the model.
Collect the outputs as new features from each ML model and add them into the original data set to enhance the performance of the final model.

Achievement:

150th place out of 1200 teams.

KKTV Data Game，2017.5 – 2017.6

Objective:

Predict the next video a user watch in the next time interval.

Responsibilities:

Extract different features from raw data, such as the latest video, the video which got the longest viewing time, the video which got the largest number of viewing.
Use the user viewing data to construct a similarity matrix of each video as additional features.

Achievement:

10th place out of 50 teams.

MRT Open Data Competition, 2017.4 – 2017.5

Objective:

Study the changes of passenger volume of MRT by surrounding geometric data.

Responsibilities:

Apply bisection method to build the edges between MRT stations.
Combine other geometric data based on these borders.
Use Lasso feature selection method to explore the importance of each feature.
Add noises into features to check the features are not randomly selected.

Achievement:

Certificate of Honorable Mention.