YEN-CHEN CHOU

Education

Project: Customer behavior modeling & Automatic hedging system

Formulated machine learning models including K Nearest Neighbors (KNN) and Random Forest to predict potential customers for magnifying business strategy and result in an additional 30% of CTA.
Deployed an automatic hedging system with PCA in Python for futures trading analyst on 42 futures contracts and reduced portfolio construction computation time for 28%.
Designed website traffic data pipeline, dashboard connected to upstream customer database, and A/B testing to enhance user experience of company website and improved the sales conversion rate by an additional 52%.

Project: Analysis of medical devices industry

Conducted industry analysis on financial reports including income statement and financial position to assess revenue growth of medical device industry.
Enhanced accuracy in identifying target customers on company’s products using data preprocessing and Linear Regression which generated additional 21% increase in revenue compared to the previous year.

Projects: Presented patent data, financial statement data to potential consultees in biotechnology •

Utilized R to conduct an industry analysis of 127 biotechnology companies in Taiwan for business strategies alignments in finding potential consultees.
Extracted key features for exploring promotional campaigns and reduced research time by 15% compared to the financial statement analysis.

Conducted data ETL pipeline to apply data preprocessing and data exploration analysis in Spark RDD, Spark SQL, and Spark DataFrame.
Performed Alternating Least Squares (ALS) for collaborative filtering to customized movie recommendation.
Achieved hyper-parameters tuning through self-defined function and minimized 80% computation cost through monitoring learning curve from data visualization.

Formed data ETL and data preprocessing pipeline through Spark DataFrame and created label from YouTube comments in regular expression.
Executed data exploration analysis to map pets video creators and pet owners in Spark RDD for target advertising and implemented tokenizing, stop-words removing and word vectorization in Word2Vec method.
Trained and evaluated Logistic Regression and Gradient-Boost Trees via k-fold cross-validation and AUC score.
Utilized Logistic Regression model with 88% accuracy and 0.89 recall to predict pet owners through comments.

Established data processing and Tableau data visualization through Spark DataFrame and utilized Spark SQL for crime incidents OLAP.
Reduced 11% potential crime rate through time series visualization and identified crime frequency for top 3 frequent crime category each month.
Programmed K-means clustering and optimized number of clusters through elbow method by sum of squared errors to specify high-risk crime areas in San Francisco.

Constructed data preprocessing and Tableau data visualization to detect click fraud through IP address, timestamp, and frequency of clicks.
Performed Logistic Regression, XGBoost, and Recurrent Neural Network (RNN) in Keras to predict app download based on advertisements with 92% accuracy and 0.7 of recall on the XGBoost model.
Optimized hyper-parameters and fine-tuned model performance via Grid-search method, ROC curve, and analyzed feature importance for key factor identification.

Preprocessed 15000 tweets by tokenizing, stemming, emoji processing, and stop-words removing, and extracted features with TF-IDF approach.
Established unsupervised learning models of K-means clustering, Latent Dirichlet Allocation for player segmentation.
Identified correlation between players’ sentiment and game performance using sentiment analysis and web scraping.