找工作
搜尋職缺
探索不同產業和地區的所有工作機會

搜尋公司
根據公司名稱尋找理想工作

主題專區
探索依特定主題或產業分類的工作機會
下載 CakeResume App
求職工具
履歷
使用我們的免費履歷工具，獲取理想職缺

作品集
分享你的作品集展現你的成功專案
履歷
使用我們的免費履歷工具，獲取理想職缺

履歷工具
免費製作、下載履歷

履歷模板
提供大量專業模板立即使用

履歷範例
從他人履歷獲取製作靈感

職業指南
各產業、職能的履歷教學與範例

履歷協助
從我們的招募團隊獲取關於履歷的專業建議
作品集
分享你的作品集展現你的成功專案

作品集工具
製作一份展現個人專業的作品集

作品集展示區
瀏覽他人的真實作品集，尋找靈感並進行人脈拓展
資源
資源
從豐富內容了解職業發展、求職策略等更多資訊
查看全部文章
求職指南
履歷
求職信
作品集＆個人品牌
面試技巧
求職新知
產業＆職位介紹
職涯發展
職涯規劃
職涯工具模板
職場人際溝通
職場管理學
人物／企業專訪
人物／企業專訪
雇主人資
人資營運
人資招募
CakeResume 特輯
團隊與企業文化
最新消息
活動分享
白皮書
2023 CakeResume 雇主品牌白皮書
2024 CakeResume MA 儲備幹部招募白皮書
2024 CakeResume 主動式徵才白皮書
精選文章
面試技巧
【自介範例】吸引人的面試自我介紹怎麼說？4 技巧完美活用自我介紹
閱讀更多
《科技職涯》Podcast
專門邀請在科技、數位等不同領域的工作者來分享他們的職涯趣事。
Apple Podcasts
Google Podcasts
Spotify
《職涯探險》Podcast
透過分享跨域思維與職涯選擇，啟發年輕人才實踐職涯目標和理想生活
Apple Podcasts
Google Podcasts
Spotify
徵才
人才搜尋引擎
搜尋履歷

職缺刊登
免費開始

獵才顧問
人才媒合服務

名義雇主（EoR）服務
在台灣建立企業團隊

雇主品牌推廣
建立和推廣您的雇主品牌
價格方案
職缺刊登價格方案

人才搜尋引擎價格方案

履歷製作價格方案
建立你的人脈
我的人脈
管理人脈及你的聯繫對象

CakeResume Meet
透過認識並連結其他使用者，擴大你的職涯人脈

社群
透過討論、活動參與與其他用戶交流
下載 CakeResume App

透過認識並連結其他使用者，擴大你的職涯人脈

CakeResume 找人才

進階搜尋

正在積極求職中

目前會考慮了解新的機會

目前沒有興趣尋找新的機會

Taiwan

台灣

Taipei City, Taiwan

Taipei, Taiwan

Taichung City, Taiwan

New Taipei City, Taiwan

台中市, 台灣

台北市, 台灣

India

Hsinchu County, Taiwan

United States

新北市, 台灣

Hsinchu City, Taiwan

Brazil

Deutschland

Germany

Japan

Taoyuan City, Taiwan

Tokyo, Japan

Ahmedabad, India

軟體

經營、管理、商務

政府機關

生物、醫藥

設計

工程研發

人資

科技

工業

生醫 / 醫療

銀行 / 保險 / 金融

顧問 / 審計

文化 / 媒體 / 娛樂

教育 / 培訓 / 招聘

健康 / 社會 / 環境

公共行政

小於 1 年

1 到 2 年

2 到 4 年

4 到 6 年

6 到 10 年

10 到 15 年

15 年以上

AI 智慧配對

National Taiwan University

國立台灣大學

國立臺灣大學

National Yang Ming Chiao Tung University

國立陽明交通大學

National Cheng Kung University

國立成功大學

Feng Chia University

National Taiwan University of Science and Technology

國立台灣科技大學

國立臺灣科技大學

逢甲大學

Chung Yuan Christian University

National Taiwan Normal University

中原大學

國立臺灣師範大學

National Tsing Hua University

國立清華大學

Affiliated Senior High School of National Taiwan Normal University

Chang Jung Christian University

China Medical University

City University of Hong Kong

Coursera Online Course

Dharmsinh Desai University

Ho Chi Minh University of Technology

I-Shou University

Kaplan University

Malaysia Perlis University

Maulana Abul Kalam Azad University of Technology

National Central University

Taiwan

台灣

Taipei City, Taiwan

台北市, 台灣

United States

Taichung City, Taiwan

台中市, 台灣

New Taipei City, Taiwan

Australia

Hsinchu City, Taiwan

Hsinchu County, Taiwan

Japan

Singapore

新北市, 台灣

新竹市, 台灣

日本

Canada

Deutschland

Germany

India

全職

兼職

實習生

Python

machine learning

Deep Learning

C++

Data Analysis

SQL

Git

docker

AWS

Computer Vision

無

有

1～5 人

5～10 人

10～15 人

15 人以上

一個月內

兩個月內

三個月內

半年內

一年內

超過一年

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Data Scientist

Software Engineer

軟體工程師

AI Engineer

Algorithm Engineer/ Data Scientist/ Sr. Project Management

Data Engineer

Data Scientist, Data Analyst, Machine Learning Engineer

Manager

Senior Software Engineer

Associate’s Degree

Bachelor of Engineering (BEng)

Bachelor of Science (BS)

Bachelor’s Degree

Master of Business Administration (MBA)

Master of Science (MS)

Master’s Degree

Doctor of Philosophy (PhD)

Non-Degree Program (e.g. Coursera certificate)

Other

大學

碩士

博士

2023

2021

2020

2018

2017

2016

2015

2014

2013

2010

在職中

Off

全選

Academia Sinica

Institute for Information Industry

TSMC

Asus 華碩電腦股份有限公司

CyberTAN

HTC VIVE

Industrial Technology Research Institute (ITRI)(工業技術研究院, 工研院)

Micron Technology 台灣美光

Microsoft

National Taiwan University

對遠端工作有興趣

暫不考慮遠端工作

我只想遠端工作

全職接案者

兼職接案者

不提供接案服務

Chinese - 母語或雙語

English - 中階

English - 進階

English - 專業

English - 母語或雙語

Japanese - 初階

English - 初階

French - 母語或雙語

German - 中階

Spanish - 初階

English

Chinese

French

Korean

4 到 6 年

6 到 10 年

10 到 15 年

15 年以上

隱藏已讀結果
展開所有工作經驗

僅開放給付費企業

軟體工程師 @Wistron NeWeb Corporation 啟碁科技股份有限公司

・

2023 ~ 2023

軟體工程師

兩個月內

Word

PowerPoint

Excel

全職 / 對遠端工作有興趣

4 到 6 年

國立中正大學(National Chung Cheng University)

・

Computer Science and Information Engineering

升級以查看

陳昱希

Computer Vision Engineer @Academia Sinica

・

2015 ~ 現在

Computer Vision Engineer

一個月內

陳昱希 Computer Vision Engineer Yu-Hsi Chen has rich experience in developing computer vision and machine learning algorithms. In his recent work at Academia Sinica, he has focused on using machine learning to solve traditional computer vision and image / video processing problems. His developed NeighborTrack is a state-of-the-art single object tracking system in the field. During his school days, he used verilog on FPGA to implement the 3A system of the camera. website: Yu-hsi Chen (franktpmvu.github.io) Taipei City, Taiwan Yu-hsi Chen (franktpmvu.github

Provides Feedback

Communication

Precision

全職 / 對遠端工作有興趣

6 到 10 年

LUNGHWA university

・

Master of Science

僅開放給付費企業

Blockchain Enginner & AI Lead @Portal Network

Blockchain engineer & Blockchain consulting

一個月內

Solidity

blockchain development

Docker

全職 / 對遠端工作有興趣

4 到 6 年

升級以查看

DboyLiao

Principal Engineer @Coretronic Corporation, 中強光電

・

2020 ~ 2022

Machine Learning Engineer

一個月內

Senior Software Developer, Wuduker Inc., JanuaryOctober 2020 iSchedule, a preference aware scheduling system. Responsible for designing RESTful API, database schema and core scheduling solver Anomaly detection system with Deep Metric Learning Develop deep learning model with PyTroch and PyTorch-Lightning Consulting service. Including Spark pipeline optimization, deep learning model development and general Python/C++ development Machine Learning Engineer, Pinkoi Inc., DecemberDecember 2019 Recommendation System, including item-based/store-based recommendation, keyword suggestion, making significant improvement on recommendation quality and coverage. Machine Learning Algorithm Design Data Pipeline, including on-site advertising and

Python

Linux

C++

全職 / 對遠端工作有興趣

6 到 10 年

國立台灣大學

・

經濟學

YEN-TING CHEN

Research Assistant of National Taiwan University @National Taiwan University

・

2023 ~ 現在

Graduate research assistant

半年內

YEN-TING CHEN (陳彥廷) I am a graduate student in the Department of Psychology at National Taiwan University ( NTU). For me, diving into psychometrics and exploring data with reasonable statistical method is to clarify a new world of understanding people around us. Whether it's a quirky little issue or a big, serious one, I've got curiosity and grabbed my attention to figure out problems using the tools or theories of psychometrics and data analysis . Right now, I am turning curiosity into discoveries in the wild world of Psychology and All kinds of Data

EDA

Python Programming

R Programming

兼職 / 對遠端工作有興趣

4 到 6 年

National Taiwan University

・

Psychometrics (Division of Psychology), Methodology (Division of Psychology)

Benjamin Deporte

AI, Machine Learning and Data Manager @IRT Saint Exupery

・

2021 ~ 現在

Data Analyst、Data Scientist、AI Engineer、Project Manager

一個月內

Benjamin Deporte [email protected] AI, Machine Learning and Data Officer Innovative AI/ML seasoned leader with strong mathematical background and hands-on knowledge of machine learning algorithms and best practices. Specialized in Cybersecurity, Healthcare and Aerospace. Demonstrated driving business value through 10+ years of experience within different businesses, in direct management or thought leadership roles. Leadership, networking, communication and language skills. Skills Expertise in Artificial Intelligence and Machine Learning Specialization in Cybersecurity, Healthcare and Aerospace. Leadership abilities, networking and communication skills Business acumen in multicultural, global organizations Work

Proficiency in Artificial Intelligence and Machine Learning

Knowledgeable in cybersecurity

Project and Account management

全職 / 對遠端工作有興趣

6 到 10 年

Télécom Paris

・

Cybersecurity

NengChien Wang

曾任

Senior Software Engineer @DOINT

・

2021 ~ 2023

Software engineer, Image Processing engineer, Algorithm engineer

三個月內

Docker Doxygen Swig (API for python from C++) OpenCV OpenCL Linux Skill multi-threads distributed computing serialization/deserialization CICD data version control (DVC) MLOps Image Processing PCA Interactive Segmentation Connected Component Image Stitching Direct Linear Transformation Kalman Filter (Tracking) SIFT Hough HoG Image Deblurring Depth Estimation ISP Machine Learning Framework Transformation Data Augmentation Transfer Learning Model Pruning Model Quantization Performance Evaluation Parameter Fine-tuning Model : SVM, LeNet, AlexNet, VGG, GoogLeNet, SSD, YOLO, MobileNet, ShuffleNet, FaceNet, Xception, MatrixNet, CenterNet, CSPNet, M2Det, EfficientNet/Det Projects HPC Keyword Spotting Automatic Speech Recognition Distributed Inference System Social Distancing Estimation (Lidar

Python

Machine Learning

C++

全職 / 對遠端工作有興趣

National Taiwan University

・

Communication Engineering

Eddy Chen

機器學習工程師 @日新軟體股份有限公司

・

2021 ~ 現在

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

半年內

focus mainly on object detection, sensor fusion and sensor calibration. 3+ years of experience in machine learning. Used to conduct research in medical AI projects and possess experience in implement real-world end-to-end ML project . Skills Programming Languages Python C++ SQL Machine Learning TensorFlow Pytorch Scikit-learn Matplotlib Others FastAPI Docker Redis Linux Certification Taiwan AI Academy - AI Technical Professionals Program NVIDIA DLI Certificate – Applications of AI for Anomaly Detection Work Experience Machine Learning Engineer NEUTEC • MayPresent Develop, optimize and maintain the machine learning algorithm for internal process automation. Deploy machine

AI & Machine Learning

Image Processing

python

全職 / 對遠端工作有興趣

4 到 6 年

國立臺北科技大學

・

機電整合所

Kishan Gondaliya

AI & Embedded Systems Consultant @Self Employed

・

2021 ~ 現在

Deep Learning Engineer

超過一年

Kishan Gondaliya Experienced embedded software engineer working on Embedded Systems and Deep Learning to enable vision and voice-based machine learning algorithms on low-power FPGA and edge embedded devices. ~8 years of experience consists in writing, debugging, and optimizing software/firmware for embedded [email protected] Ahmedabad, Gujarat, India Skillset Languages: Frameworks: Dev Tools: HW Platform: Cloud (GCP): Cloud (AWS): Other: C, Python, C++ Tensorflow (TFlite, TFmicro), Keras, Caffe, Darknet Anaconda, Git, Gerrit, Perforce, Pycharm, CVS, Jira, Confluence Google Coral TPU, Lattice ECP5, U+, Crosslin-NX FPGA, Raspberry Pi, Intel Movidius, NVIDIA

Deep Learning

machine learning

aws

全職 / 對遠端工作有興趣

4 到 6 年

Charotar University of Science & Technology

・

Electronics & Communication

僅開放給付費企業

Data Scientist | Associate Researcher @China Engineering Consultants, Inc.

・

2020 ~ 現在

資料分析師

超過一年

Python

SQL/MySQL

SQL Server

全職 / 對遠端工作有興趣

6 到 10 年

國立東華大學(National Dong Hwa University)

・

應用數學研究所

升級以查看

最輕量、快速的招募方案，數百家企業的選擇

搜尋履歷，主動聯繫求職者，提升招募效率。

瀏覽所有搜尋結果
每日可無限次數開啟陌生對話
搜尋僅開放付費企業檢視的履歷
檢視使用者信箱 & 電話

立即升級

7 天內退款保證，可隨時取消

1 2 3 4 5 6 7 8 9

搜尋技巧

嘗試搜尋最精準的關鍵字組合

資深後端 php laravel

如果結果不夠多，再逐一刪除較不重要的關鍵字

將須完全符合的字詞放在雙引號中

"社群行銷"

在不想搜尋到的字詞前面加上減號，如果想濾掉中文字，需搭配雙引號使用 (-"人資")

UI designer -UX

免費方案僅能搜尋公開履歷。

升級至進階方案，即可瀏覽所有搜尋結果（包含數萬筆覽僅在 CakeResume 平台上公開的履歷）。

立即升級

職場能力評價定義

專業技能

該領域中具備哪些專業能力（例如熟悉 SEO 操作，且會使用相關工具）。

問題解決能力

能洞察、分析問題，並擬定方案有效解決問題。

變通能力

遇到突發事件能冷靜應對，並隨時調整專案、客戶、技術的相對優先序。

溝通能力

有效傳達個人想法，且願意傾聽他人意見並給予反饋。

時間管理能力

了解工作項目的優先順序，有效運用時間，準時完成工作內容。

團隊合作能力

具有向心力與團隊責任感，願意傾聽他人意見並主動溝通協調。

領導力

專注於團隊發展，有效引領團隊採取行動，達成共同目標。

超過一年

呂學寬

HIPR Pacsoft Technologies

・

2020 ~ 2021

Taipei, 台灣

專業背景

目前狀態

就職中

求職階段

專業

數據科學家

產業

工作年資

小於 1 年

管理經歷

無

技能

Python

C++

Java

tensorflow

語言能力

English

・

專業

Chinese

・

母語或雙語

求職偏好

希望獲得的職位

機器學習工程師

預期工作模式

全職

期望的工作地點

Taipei, 台灣, Kaohsiung City, 台灣

遠端工作意願

對遠端工作有興趣

接案服務

否

學歷

學校

Tsinghua University

主修科系

Computer Science

列印

Hsuehkuan Lu

Machine Learning Engineer

Highly focused, cooperative, and with strong learning ability. Passionate about machine learning, natural language processing/understanding, and data science. Practical experiences in ML projects with Python, and GraphQL API design. Proficient in implementing algorithms and researching.

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

專注力強，樂於合作，學習能力強。對機器學習、自然語言處理/理解以及資料科學充滿熱情。擅長以Python做數據分析，實作機器學習專案，以及設計 GraphQL API。具備熟練實作算法模型以及研究能力。

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End

履歷

個人檔案

列印

Hsuehkuan Lu

Machine Learning Engineer

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End