YEN-CHEN CHOU

Education

University of Southern California, Master of Science (MS), Data Science, 2018 ~ 2020

Fu Jen Catholic University, Bachelor of Business Administration (BBA), Statistics, 2012 ~ 2016

Work Experience

Fintech995, Data Science Intern, Jan 2018 ~ Aug 2018

Project: Customer behavior modeling & Automatic hedging system 

  • Formulated machine learning models including K Nearest Neighbors (KNN) and Random Forest to predict potential customers for magnifying business strategy and result in an additional 30% of CTA. 
  • Deployed an automatic hedging system with PCA in Python for futures trading analyst on 42 futures contracts and reduced portfolio construction computation time for 28%.
  •  Designed website traffic data pipeline, dashboard connected to upstream customer database, and A/B testing to enhance user experience of company website and improved the sales conversion rate by an additional 52%.

Vincera Capital, Business Analyst Intern, Jul 2015 ~ Aug 2015

Project: Analysis of medical devices industry

  • Conducted industry analysis on financial reports including income statement and financial position to assess revenue growth of medical device industry.
  • Enhanced accuracy in identifying target customers on company’s products using data preprocessing and Linear Regression which generated additional 21% increase in revenue compared to the previous year.

WISPRO Consulting Firm, Data Analyst Intern, Jul 2014 ~ Aug 2014

Projects: Presented patent data, financial statement data to potential consultees in biotechnology •

  • Utilized R to conduct an industry analysis of 127 biotechnology companies in Taiwan for business strategies alignments in finding potential consultees.
  • Extracted key features for exploring promotional campaigns and reduced research time by 15% compared to the financial statement analysis.

Projects

Apache Spark Movie Recommendation System,  Jan 2019

  • Conducted data ETL pipeline to apply data preprocessing and data exploration analysis in Spark RDD, Spark SQL, and Spark DataFrame.
  • Performed Alternating Least Squares (ALS) for collaborative filtering to customized movie recommendation.
  • Achieved hyper-parameters tuning through self-defined function and minimized 80% computation cost through monitoring learning curve from data visualization.  

Apache Spark Pet Owners Prediction on YouTube Comments, Dec 2018

  • Formed data ETL and data preprocessing pipeline through Spark DataFrame and created label from YouTube comments in regular expression. 
  • Executed data exploration analysis to map pets video creators and pet owners in Spark RDD for target advertising and implemented tokenizing, stop-words removing and word vectorization in Word2Vec method. 
  • Trained and evaluated Logistic Regression and Gradient-Boost Trees via k-fold cross-validation and AUC score. 
  • Utilized Logistic Regression model with 88% accuracy and 0.89 recall to predict pet owners through comments.

San Francisco Crime Clustering and Spatial Analysis in Apache Spark, Dec 2018

  • Established data processing and Tableau data visualization through Spark DataFrame and utilized Spark SQL for crime incidents OLAP.
  • Reduced 11% potential crime rate through time series visualization and identified crime frequency for top 3 frequent crime category each month.
  • Programmed K-means clustering and optimized number of clusters through elbow method by sum of squared errors to specify high-risk crime areas in San Francisco.

Mobile App Download Prediction and Fraudulent Click Traffic Detection, Nov 2018

  • Constructed data preprocessing and Tableau data visualization to detect click fraud through IP address, timestamp, and frequency of clicks.
  • Performed Logistic Regression, XGBoost, and Recurrent Neural Network (RNN) in Keras to predict app download based on advertisements with 92% accuracy and 0.7 of recall on the XGBoost model.
  • Optimized hyper-parameters and fine-tuned model performance via Grid-search method, ROC curve, and analyzed feature importance for key factor identification. 

Natural Language Processing and Topic Modeling on NBA Players’ Twitter, Oct 2018

  • Preprocessed 15000 tweets by tokenizing, stemming, emoji processing, and stop-words removing, and extracted features with TF-IDF approach. 
  • Established unsupervised learning models of K-means clustering, Latent Dirichlet Allocation for player segmentation. 
  • Identified correlation between players’ sentiment and game performance using sentiment analysis and web scraping.