mingchien Sung

MR.中鑒者-多重部落格搜尋、業配文鑑定網站MR. Zhongjianer - Multi-Blog Search, Industry Matching Website

By mingchien Sung on February 11, 2019

34
0
7
MR.中鑒者-多重部落格搜尋、業配文鑑定網站
Translated by GoogleShow original (zh-TW)
自學 Python ,使用 requests, BeautifulSoup 進行爬蟲,爬取各大部落客網站,以及 Youtube 搜尋頁面,在使用者提出關鍵字時,即時進行爬取,且對部落格文章,透過 jieba 進行初步簡單的 NLP 分析,將自訂的業配詞彙做出比對,以判斷是否像是業配文。 個人評價:僅做出網頁功能雛形,沒有考慮許多正式上線時會出現的狀況,工作量回頭看來非常淺顯,起因於成員間專案經驗欠缺、程式熟悉度不足,導致工作分配不均,隨意提個改善點:專案使用前後端分離的網頁架構會更好,除了爬蟲將後端 API 也一併完成讓前端 AJAX 取用為佳,不僅增加使用者體驗同時資料夾架構也比較完善。 4人專案 與一位夥伴負責完成爬蟲工作。 github: https://github.com/j551234/blog
Self-learning Python, using requests, BeautifulSoup for crawling, crawling major blog sites, and Youtube search pages, crawling instantly when users suggest keywords, and conducting preliminary simple NLP analysis on blog posts via jieba , to compare the custom business vocabulary to determine whether it is like a business. Personal evaluation: Only the prototype of the webpage function is made, and many situations that will appear when it is officially launched are not considered. The workload is very shallow, which is caused by the lack of experience in the project between members and the lack of familiarity of the program, resulting in uneven work distribution. An improvement point: the web site architecture with separate front and back ends of the project will be better. In addition to the crawler, the backend API will also be completed to make the front-end AJAX access better, which not only increases the user experience, but also improves the folder structure. The 4-person project and a partner are responsible for completing the reptile work. Github: https://github.com/j551234/blog

Please login first.

Other works from mingchien