Specialized in implementing machine learning to obtain actionable insights for business strategy and product efficiency. Experience in constructing machine learning pipelines on distributed systems such as Apache Spark. A total of one-year work experience in data science related field. Actively seeking for data science internship.
Education
University of Southern California, Master of Science (MS), Data Science, 2018 ~ 2020
Fu Jen Catholic University, Bachelor of Business Administration (BBA), Statistics, 2012 ~ 2016
Work Experience
Fintech995, Data Science Intern, Jan 2018 ~ Aug 2018
Project: Customer behavior modeling & Automatic hedging system
Formulated machine learning models including K Nearest Neighbors (KNN) and Random Forest to predict potential customers for magnifying business strategy and result in an additional 30% of CTA.
Deployed an automatic hedging system with PCA in Python for futures trading analyst on 42 futures contracts and reduced portfolio construction computation time for 28%.
Designed website traffic data pipeline, dashboard connected to upstream customer database, and A/B testing to enhance user experience of company website and improved the sales conversion rate by an additional 52%.
Vincera Capital, Business Analyst Intern, Jul 2015 ~ Aug 2015
Project: Analysis of medical devices industry
Conducted industry analysis on financial reports including income statement and financial position to assess revenue growth of medical device industry.
Enhanced accuracy in identifying target customers on company’s products using data preprocessing and Linear Regression which generated additional 21% increase in revenue compared to the previous year.
WISPRO Consulting Firm, Data Analyst Intern, Jul 2014 ~ Aug 2014
Projects: Presented patent data, financial statement data to potential consultees in biotechnology •
Utilized R to conduct an industry analysis of 127 biotechnology companies in Taiwan for business strategies alignments in finding potential consultees.
Extracted key features for exploring promotional campaigns and reduced research time by 15% compared to the financial statement analysis.
Conducted data ETL pipeline to apply data preprocessing and data exploration analysis in Spark RDD, Spark SQL, and Spark DataFrame.
Performed Alternating Least Squares (ALS) for collaborative filtering to customized movie recommendation.
Achieved hyper-parameters tuning through self-defined function and minimized 80% computation cost through monitoring learning curve from data visualization.
Formed data ETL and data preprocessing pipeline through Spark DataFrame and created label from YouTube comments in regular expression.
Executed data exploration analysis to map pets video creators and pet owners in Spark RDD for target advertising and implemented tokenizing, stop-words removing and word vectorization in Word2Vec method.
Trained and evaluated Logistic Regression and Gradient-Boost Trees via k-fold cross-validation and AUC score.
Utilized Logistic Regression model with 88% accuracy and 0.89 recall to predict pet owners through comments.
Established data processing and Tableau data visualization through Spark DataFrame and utilized Spark SQL for crime incidents OLAP.
Reduced 11% potential crime rate through time series visualization and identified crime frequency for top 3 frequent crime category each month.
Programmed K-means clustering and optimized number of clusters through elbow method by sum of squared errors to specify high-risk crime areas in San Francisco.
Mobile App Download Prediction and Fraudulent Click Traffic Detection, Nov 2018
Constructed data preprocessing and Tableau data visualization to detect click fraud through IP address, timestamp, and frequency of clicks.
Performed Logistic Regression, XGBoost, and Recurrent Neural Network (RNN) in Keras to predict app download based on advertisements with 92% accuracy and 0.7 of recall on the XGBoost model.
Optimized hyper-parameters and fine-tuned model performance via Grid-search method, ROC curve, and analyzed feature importance for key factor identification.