Connor Hsu (leafwind)
對數據與資料有著高度好奇心，喜歡研究資訊落差、數據操弄與資料分析等議題，現為新創軟體工程師，每天在人工智慧與現實數據拉扯，閒暇時經營部落格《all about data》
Software Engineer, Appier, Jun 2014 - Present
Experienced in coordinating cross-team product features, burning down company level technical debts as an agile coach, architecture team member and data governance committee member.Build API service as well as underlying ETL pipelines
Build RTB bidding algorithm in a fast-growing business environment.
Conduct experiments to make daily improvements and achieve business goals.
Inference root cause in an uncontrolled, time-sensitive environment to solve critical issues.
Enable AI/ML product pipelines with petabytes level data.
Research Assistant, CSIE, NTU, Oct 2012 - May 2014
Build an online video retrieval system via linking algorithms with open source softwares.
Co-develop retrieval algorithms (feature selection, ranking and indexing).
Collect video segments from digital TV signal and build terabyte database as the experiment dataset.
National Taiwan University, Taiwan, Sep 2009 - Jun 2011
M.S., Department of Computer Science and Information EngineeringMe-link: Link me to the media - fusing audio and visual cues for robust and efficient mobile media interaction (WWW 2014)Comp2Watch: enhancing the mobile video browsing experience (IMMPD 2011)Snap2Read: Automatic Magazine Capturing and Analysis for Adaptive Mobile Reading (MMM 2011)
National Chiao Tung University, Taiwan, Sep 2005 - Jun 2009
B.S., Department of Computer Science
Deep Funnel Pipeline Reconstruction '17Q2-Q4
Takeover pipeline from scientists and improve it by applying unit test, migrating DB, code-refactoring, and migrating to Jenkins, . ('17 Q2)
Co-work with team members to migrate pipeline to Airflow ('17Q3)
Continuously Improve ETL Processes
Gain 300% update frequency with 33% cores via reusing data flow and rescheduling the jobs according to its data recency. (2015 Q3)
Further achieve 200% update frequency with less resource by pipeline refactoring.
Data Cleansing & Data GovernanceSurvey over 250 (undocumented) major spark table fields, deprecated/noted: 80+, corrected 20+ of them within one week. (2016 Q3)Build a monitoring dashboard for major columns which have issues before.
Trace complex (RDB, Spark, static csv/json/parquet, memory, API services) and undocumented data pipelines produced by different owners in a daily basis.
Improve ML Model Performance
Improve high quality inventory prediction by adapting suitable ML model (2016 Q2)precision 5.3% -> 79%volume increased to 1280%
Extend CPA model to different scenarioCPA reduced to 68%volume increased to 240%
MeLink: An online multimedia retrieval system, NTU, 2014
Scenario Demo: http://vimeo.com/leafwind/melink-scenario
Using aural-visual signature captured by mobile to retrieve multimedia objects (image/video/audio) in million-scale dataset within a few seconds.
Design/maintain index structures for different features.
C, C++, Python, iOS app, MPlayer, Echoprint, OpenCV, Solr, TokyoTyrant
My Machine Learning Engineering in Appier
I’ve stay more than 2.5 years in Appier, at the first 2 years, my job is to do anything that help AI work with the growing business, as well as the data platform/system which are getting more and more complicated (loop and twisted dependencies, propagated errors without monitoring, and no documents). Most of them are engineering job, but not always the case... Continue Reading
會有「可以跳過」的錯覺，或許是以為軟體與數據是硬技術，而且是「硬背的技術」：能夠照本宣科，有固定流程可以依循。但事實上它們不是，是軟的文化。人工智慧雖然可以跟硬體結合，但本質上無法逃離軟體服務的範疇。「軟體服務」，就是由看不見實體的「軟體」與沒有實體的「服務」所組成，而這些正是台灣長年漠視的文化：我們在意的是看得見的包子與看得見的價格；對深遠的影響與內在的價值卻鮮少認真評估... Continue Reading
這系列文章原文為 Martin Zinkevich (Research Scientist @ Google) 所著之 “Rules of Machine Learning”。看了一部分之後覺得深得我心（尤其「你會面對的絕大多數問題是工程問題」這一點）。作為一個參與過部分機器學習開發的工程師，很希望在任何一個產品被開發之前，就能讓團隊成員知道這些有用的「老生常談」，從而避免很多系統上的冤枉路。於是想要深入細讀、做些筆記，將它推廣到繁體中文界。因此我試著儘量在維持原意的情況下翻譯、排版，並將一些原文較為精簡的敘述，補上（以個人經驗推斷的）詳細說明... Continue Reading
Data Pipeline / ETL
Spark, Spark SQL, MySQL
Others (Sqlite3, LMDB, InfluxDB)
AWS S3, Linux
Nginx + uWSGI + Flask
gspread, boto, scrapy, cachetools,
word2vec, janome, jieba, pandas
Pyflakes, Pylint, Nose, Vulture, Coverage
Others (Travis CI, Coveralls)
Slack pokemon RPG / Tarot bot
Line weather bot
Twitch engagement bot
Web & Visualization
d3.js, bokeh (toy)
Scrum, Kanban and Scrumban
Although not a professional photographer, I like to take pictures by my phone
These photos were took by LG G4.
sky @ Ming Chuan University, Taipei
sunset @ Guam