Collected data form the website (Allrecipes) using Scrapy and stored them by Hadoop.
Built index for search engine through two methods. The first used PyLucene with assigned filed, and
the second created pipeline to calculate TF-IDF on Hadoop.
Big-Data Management, UCR
Twitter Data Analyze in Car Brand Topic Sep. 2019 – Dec.2019
Crawled the text contents related to car brands through Scrapy and stored them into MongoDB.
Created multiple processes to generate key word and brand popularity analytic results.
Visualized the results into interactive graph such as word cloud and geographic hotspot map and built
a webpage to support all the search queries by python Dash.
Research Assistant, Machine Intelligence Group (MIG), NCCU
Automated word segmentation in Tang poetry and epitaph projectJun. 2019 – Jul. 2019
Implemented an analyze tool to verify quality of data labeled by different source
Trained word embedding model via gensim word2vec package
Combined word embedding model with LSTM model (Keras) and added some special feature (e.g.
rhyme) to implement an auto procedure to fulfill word segmentation in poetry and epitaph
Improved the model precision to near 90% and presented the result in short paper and was
successfully accepted to Digital Humanities 2020 Conference
Research Assistant, Machine Intelligence Group (MIG), NCCU
Chinese text content simplification and compression project Nov. 2018 – Dec. 2018
Used python beautiful soup package and regular expression operations to extract contents collected
from Biographies in Local Gazetteers
Designed a syntactic analytic application which could generate simplified and compressed Chinese
text content by combining python NLTK package and Stanford NLP tool
Demonstrated the application result of content simplification and compression in short paper and was
successfully accepted to Digital Humanities 2019 Conference
Undergraduate special topic on computer science, NCCU
HiCmapTools: A tool to analyze Hi-Contact Data Jul. 2017 – Feb. 2018
Designed a set of computation procedure tools by C++ to help biologist perform data analysis on different queries
Fulfilled an auxiliary statistic tool (R) to visualize result and analyze the quality
Received a research grant from Academia Sinica (project number: 106-2813-C-004-036-E )
Won third place in department exhibition
EDUCATION
University of California, Riverside, Riverside, CA (UCR) Dec. 2020
Master of Science in Computer Science
Relevant course work: Big-Data Management, Artificial Intelligence, Information Retrieval and Web Search,
Probability Model for Artificial Intelligence, Data Mining Techniques, Statistic, Database System, Information Visualization, Business Analytics with SAS/R
National Chengchi University, Taipei, Taiwan (NCCU) June 2018
Bachelor of Science in Computer Science
Completion of Big Data Analytics Program for Undergraduates