1. 資料分析專長、資料倉儲專長
2. 資料視覺化:精通ggplot2, highcharter, matplotlib等資料視覺化套件
3. R language : 精通dplyr, sparklyr, purrr等資料處理套件
4. Python:LightGBM, Pytorch, Fastai 等常用modeling套件
5. MySQL:Database管理以及query操作
6. Clickhouse:資料倉儲工具,管理複雜的Data Pipeline
7. 英語能力:多益850分,托福83分
以下是我的GitHub: https://github.com/KobyashiMaru
國立政治大學統計學系研究所
資料分析師
新北市,臺灣
電子郵件:[email protected]
手機:0930761363
在統計分析的角度上,一場遊戲的勝負會由某些特徵來決定,本分析在尋找出在遊戲中勝利或是失敗的特徵,進而在遊戲中預測是否勝利。
我們使⽤Logistic Regression來預測並分析要如何在第⼀⼈稱射擊遊戲Overwatch中贏得⼀場會戰。
開發一個可以觀察用戶觀看的電影類型以及評分,進而推薦其他電影給用戶的系統。
使⽤ partial SVD和cosine similarity 和Jaccard similarity計算出電影之間的相似性來當作權重,進⽽計算出每部電影在使⽤ 者的評分下的加權平均,排序後再推薦給使⽤者。
圖像辨識,在⼈⼯智慧中是相當熱⾨的 技術。本次實作旨在使⽤神經網路架構 辨識出3種外型相似的⾶機。
使⽤爬蟲來蒐集Boeing 747, Airbus A340, Airbus A380的⾶機圖片,再使⽤ resnet34架構下的CNN來正確的辨識出 三種不同的⾶機。
提高圖片解析度問題,又名超級解析度問題(Super Resolution Problem),近年在AI逐漸熱門時,獲得長足的進步。這邊使用生成對抗網路(GAN)來實作。
使用Super Resolution GAN技術,將低解析度照片轉成高解析度照片,再試圖將低解析度的影片轉換成高解析度影片。
本實作旨在以數學演算法搜尋一個掃地機器人在一個房間內打掃完所需的最短路徑。
在已知在一個房間內所有障礙物的強況下,使用深度優先演算法將最短路徑求出並且視覺化表現出來。
MovieLens Dataset是⼀個建立推薦系統的相當有名的資料集,我使⽤的資料集有超過2000萬筆對於27278個電影的評分資料,有 了這些資料,我們就可以很輕易地使⽤協同過濾的⽅法做出推薦系統。由於推薦系統在本篇論⽂中並不是主要研究的⽅向,但是我 使⽤ partial SVD和cosine similarity 計算出電影之間的相似性來當作權重,進⽽計算出每部電影在使⽤者的評分下的排名,進⽽推薦出電影給使⽤者。
我主要研究的⽅向在於使⽤LightGBM模型預測出電影的利潤。由於MovieLens Dataset並沒有提供相關資料,因此我們必須⽤爬蟲程式經我們需要的資料從IMDb爬下來,再進⾏預測。模型的結果相當不錯,資料以quantile分割成3個類別時,10-fold CV的平均準確度到達約70%,做出的模型準確度比起三年前IEEE期刊上的論文,準確度高出15%。
This is my GitHub: https://github.com/KobyashiMaru
National Chengchi University, Statistics Master's Degree
Data Analysist
New Taipei City, Taiwan
In a sense of statistical analysis, winning a game can be determined by some factors. This project is to find the factors of winning or losing a fight, then try to predicting the result in the middle of gaming.
Using Logistic Regression to predict
and analyze how to win a team fight
in the game "Overwatch"
.
Developing a system that can recommend movies to users by observing what kind of movies users watched and rated.
By using partial SVD and similarity
algorithms as the similarity of each
movie, then calculate rank to
recommend movies to user
Image recognition is a very popular technique in artificial intelligence. The goal of this project is building a neural network that can classify 3 types of similar planes.
Start from scraping down images of
three types of airplane, Boeing 747,
Airbus 340, Airbus 380. Then using
resnet34 CNN to build a model can
distinguish these 3 types of planes.
Increase image resolution problem, known as super resolution problem. This area had made a good progress when AI became popular recent years. This project using a Generative Adversarial Network (GAN) to make that happen.
Using Super Resolution GAN model,
to make a low resolution images into
a high resolution images. Also we try
to make a low resolution video into a
high resolution video.
The goal of this project is to find the shortest path for a cleaning robot to clean a room, by a graph theory algorithm.
In the condition of knowing all the
obstacles of the room. Using depth first search algorithm to find the
shortest path.
MovieLens Dataset is a well known dataset for setting up a recommendation system, which I really did, by using 20 million rating for 27278 movies, we can easily set up a collaborative filtering recommendation system. Since the recommendation system is not the main focus I'm working on, I only use partial SVD and cosine similarity to find the similarity of movies, and use those similarity as weight of user's rating to compute the rank of recommendation. And the result came pretty well.
My main focus is working on predicting the profit of a movie by using LightGBM. Since MovieLens has no film grossing and film budget, and other detail data of movies, however, we can use IMDb to scrape down the variables we need. The models
turns out pretty well, I cut movies into 3 categories by quantile of profits, and the 10-fold CV accuracy is about 70%.
1. 資料分析專長、資料倉儲專長
2. 資料視覺化:精通ggplot2, highcharter, matplotlib等資料視覺化套件
3. R language : 精通dplyr, sparklyr, purrr等資料處理套件
4. Python:LightGBM, Pytorch, Fastai 等常用modeling套件
5. MySQL:Database管理以及query操作
6. Clickhouse:資料倉儲工具,管理複雜的Data Pipeline
7. 英語能力:多益850分,托福83分
以下是我的GitHub: https://github.com/KobyashiMaru
國立政治大學統計學系研究所
資料分析師
新北市,臺灣
電子郵件:[email protected]
手機:0930761363
在統計分析的角度上,一場遊戲的勝負會由某些特徵來決定,本分析在尋找出在遊戲中勝利或是失敗的特徵,進而在遊戲中預測是否勝利。
我們使⽤Logistic Regression來預測並分析要如何在第⼀⼈稱射擊遊戲Overwatch中贏得⼀場會戰。
開發一個可以觀察用戶觀看的電影類型以及評分,進而推薦其他電影給用戶的系統。
使⽤ partial SVD和cosine similarity 和Jaccard similarity計算出電影之間的相似性來當作權重,進⽽計算出每部電影在使⽤ 者的評分下的加權平均,排序後再推薦給使⽤者。
圖像辨識,在⼈⼯智慧中是相當熱⾨的 技術。本次實作旨在使⽤神經網路架構 辨識出3種外型相似的⾶機。
使⽤爬蟲來蒐集Boeing 747, Airbus A340, Airbus A380的⾶機圖片,再使⽤ resnet34架構下的CNN來正確的辨識出 三種不同的⾶機。
提高圖片解析度問題,又名超級解析度問題(Super Resolution Problem),近年在AI逐漸熱門時,獲得長足的進步。這邊使用生成對抗網路(GAN)來實作。
使用Super Resolution GAN技術,將低解析度照片轉成高解析度照片,再試圖將低解析度的影片轉換成高解析度影片。
本實作旨在以數學演算法搜尋一個掃地機器人在一個房間內打掃完所需的最短路徑。
在已知在一個房間內所有障礙物的強況下,使用深度優先演算法將最短路徑求出並且視覺化表現出來。
MovieLens Dataset是⼀個建立推薦系統的相當有名的資料集,我使⽤的資料集有超過2000萬筆對於27278個電影的評分資料,有 了這些資料,我們就可以很輕易地使⽤協同過濾的⽅法做出推薦系統。由於推薦系統在本篇論⽂中並不是主要研究的⽅向,但是我 使⽤ partial SVD和cosine similarity 計算出電影之間的相似性來當作權重,進⽽計算出每部電影在使⽤者的評分下的排名,進⽽推薦出電影給使⽤者。
我主要研究的⽅向在於使⽤LightGBM模型預測出電影的利潤。由於MovieLens Dataset並沒有提供相關資料,因此我們必須⽤爬蟲程式經我們需要的資料從IMDb爬下來,再進⾏預測。模型的結果相當不錯,資料以quantile分割成3個類別時,10-fold CV的平均準確度到達約70%,做出的模型準確度比起三年前IEEE期刊上的論文,準確度高出15%。
This is my GitHub: https://github.com/KobyashiMaru
National Chengchi University, Statistics Master's Degree
Data Analysist
New Taipei City, Taiwan
In a sense of statistical analysis, winning a game can be determined by some factors. This project is to find the factors of winning or losing a fight, then try to predicting the result in the middle of gaming.
Using Logistic Regression to predict
and analyze how to win a team fight
in the game "Overwatch"
.
Developing a system that can recommend movies to users by observing what kind of movies users watched and rated.
By using partial SVD and similarity
algorithms as the similarity of each
movie, then calculate rank to
recommend movies to user
Image recognition is a very popular technique in artificial intelligence. The goal of this project is building a neural network that can classify 3 types of similar planes.
Start from scraping down images of
three types of airplane, Boeing 747,
Airbus 340, Airbus 380. Then using
resnet34 CNN to build a model can
distinguish these 3 types of planes.
Increase image resolution problem, known as super resolution problem. This area had made a good progress when AI became popular recent years. This project using a Generative Adversarial Network (GAN) to make that happen.
Using Super Resolution GAN model,
to make a low resolution images into
a high resolution images. Also we try
to make a low resolution video into a
high resolution video.
The goal of this project is to find the shortest path for a cleaning robot to clean a room, by a graph theory algorithm.
In the condition of knowing all the
obstacles of the room. Using depth first search algorithm to find the
shortest path.
MovieLens Dataset is a well known dataset for setting up a recommendation system, which I really did, by using 20 million rating for 27278 movies, we can easily set up a collaborative filtering recommendation system. Since the recommendation system is not the main focus I'm working on, I only use partial SVD and cosine similarity to find the similarity of movies, and use those similarity as weight of user's rating to compute the rank of recommendation. And the result came pretty well.
My main focus is working on predicting the profit of a movie by using LightGBM. Since MovieLens has no film grossing and film budget, and other detail data of movies, however, we can use IMDb to scrape down the variables we need. The models
turns out pretty well, I cut movies into 3 categories by quantile of profits, and the 10-fold CV accuracy is about 70%.