找工作
搜寻职缺
探索不同产业和地区的所有工作机会

搜寻公司
根据公司名称寻找理想工作

主题专区
探索依特定主题或产业分类的工作机会
下载 CakeResume App
求职工具
简历
使用我们的免费简历工具，获取理想职缺

作品集
分享你的作品集展现你的成功专案
简历
使用我们的免费简历工具，获取理想职缺

简历工具
免费制作、下载简历

简历模板
提供大量专业模板立即使用

简历范例
从他人简历获取制作灵感

职业指南
各产业、职能的简历教学与范例

简历协助
从我们的招募团队获取关于简历的专业建议
作品集
分享你的作品集展现你的成功专案

作品集工具
制作一份展现个人专业的作品集

作品集展示区
浏览他人的真实作品集，寻找灵感并进行人脉拓展
资源
资源
从丰富内容了解职业发展、求职策略等更多资讯
查看全部文章
求职指南
简历
求职信
作品集＆个人品牌
面试技巧
求职新知
产业＆职位介绍
职业发展
职业规划
职业工具模板
职场人际沟通
职场管理学
人物／企业专访
人物／企业专访
雇主人力资源
人力资源运营
人力资源招募
CakeResume 专题
团队与企业文化
最新消息
活动分享
White Paper
CakeResume 2023 Employer Branding Ebook
CakeResume 2024 Management Associate Ebook
CakeResume 2024 Active Sourcing Ebook
精选文章
面试技巧
【自介範例】吸引人的面試自我介紹怎麼說？4 技巧完美活用自我介紹
阅读更多
《科技职涯》Podcast
专门邀请在科技、数位等不同领域的工作者来分享他们的职涯趣事。
Apple Podcasts
Google Podcasts
Spotify
Career Adventure Podcast
We inspire young professionals by showcasing diverse career journeys.
Apple Podcasts
Google Podcasts
Spotify
招聘
人才搜寻引擎
搜寻简历

职缺刊登
免费开始

猎才顾问
人才媒合服务

名义雇主（EoR）服务
在台湾建立企业团队

雇主品牌推广
建立和推广您的雇主品牌
价格方案
职缺刊登价格方案

人才搜寻引擎价格方案

简历制作价格方案
建立你的人脉
我的人脉
管理人脉及你的联系对象

CakeResume Meet
透过认识并连结其他使用者，扩大你的职涯人脉

社群
透过讨论、活动参与与其他用户交流
下载 CakeResume App

我的人脉

管理人脉及你的联系对象

CakeResume Meet

透过认识并连结其他使用者，扩大你的职涯人脉

社群

透过讨论、活动参与与其他用户交流

CakeResume 找人才

进阶搜寻

正在积极求职中

目前会考虑了解新的机会

目前没有兴趣寻找新的机会

Taiwan

台灣

Taipei City, Taiwan

台北市, 台灣

Taipei, Taiwan

New Taipei City, Taiwan

Taichung City, Taiwan

新北市, 台灣

United States

台中市, 台灣

India

Indonesia

Hsinchu County, Taiwan

Tainan City, Taiwan

Taoyuan City, Taiwan

Hsinchu City, Taiwan

New Taipei, Taiwan

Vietnam

Việt Nam

भारत

软体

经营、管理、商务

工程研发

教育

政府机关

设计

业务

客户服务

游戏制作

人资

生物、医药

建设

金融

物流、贸易

行销

法律

制造

文字编辑、新闻采访、艺术演艺

其他

科技

工业

银行 / 保险 / 金融

生医 / 医疗

顾问 / 审计

教育 / 培训 / 招聘

广告 / 行销 / 代理

文化 / 媒体 / 娱乐

分销

健康 / 社会 / 环境

移动 / 运输

公共行政

公司服务

设计 / 艺术

食品和饮料

饭店 / 旅游 / 休闲

法律 / 法规

小於 1 年

1 到 2 年

2 到 4 年

4 到 6 年

6 到 10 年

10 到 15 年

15 年以上

AI 智能配对

National Taiwan University

國立台灣大學

國立臺灣大學

National Yang Ming Chiao Tung University

國立陽明交通大學

National Cheng Kung University

國立成功大學

National Central University

National Chengchi University

National Taiwan University of Science and Technology

National Tsing Hua University

國立中央大學

國立台灣科技大學

國立政治大學

國立清華大學

國立臺灣科技大學

National Sun Yat-sen University

國立中山大學

Feng Chia University

逢甲大學

National Taipei University of Technology

國立台北科技大學

國立臺北科技大學

Chung Yuan Christian University

National Chung Cheng University

中原大學

國立中正大學

National Dong Hwa University

National Taiwan Normal University

國立東華大學

Taiwan

台灣

Taipei City, Taiwan

台北市, 台灣

United States

New Taipei City, Taiwan

新北市, 台灣

Taichung City, Taiwan

台中市, 台灣

Singapore

Japan

Taoyuan City, Taiwan

日本

桃園市, 台灣

Hsinchu City, Taiwan

新竹市, 台灣

Australia

Hsinchu County, Taiwan

India

Indonesia

全职

兼职

Intern

Python

Machine Learning

Docker

Git

Linux

Java

Excel

SQL

JavaScript

AWS

无

有

1～5 人

5～10 人

10～15 人

15 人以上

一個月內

兩個月內

三個月內

半年內

一年內

超過一年

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Software engineer

軟體工程師

Data Scientist

Senior Software Engineer

Manager

Data Scientist, Data Analyst, Machine Learning Engineer

Software Engineer / Backend Engineer

Data Engineer

Software Developer

Bachelor of Engineering (BEng)

Bachelor of Science (BS)

Bachelor’s Degree

Engineer’s Degree

Master of Business Administration (MBA)

Master of Science (MS)

Master’s Degree

Doctor of Philosophy (PhD)

Non-Degree Program (e.g. Coursera certificate)

Other

高中职

专科

大学

硕士

博士

2023

2021

2019

2018

2017

2016

2015

2014

2013

2012

在职中

Off

全选

Institute for Information Industry

TSMC

Academia Sinica

ASUS

IBM

MediaTek Inc.

Micron Technology 台灣美光

NVIDIA

國立臺灣大學

緯創資通股份有限公司

对远端工作有兴趣

暂不考虑远端工作

我只想远端工作

全职接案者

兼职接案者

不提供接案服务

Chinese - 母语或双语

English - 中阶

English - 进阶

English - 专业

English - 母语或双语

Japanese - 初阶

English - 初阶

Chinese - 进阶

Japanese - 中阶

French - 母语或双语

English

Chinese

Vietnamese

French

Indonesian

Korean

4 到 6 年

6 到 10 年

10 到 15 年

15 年以上

隐藏已读结果
展开所有工作经验

陶俊良

資料分析師 Data Analyst @Portto 門戶科技| Blocto

・

2022 ~ 2024

Data Analyst、Data Engineer、Data Scientist、Customer Experience Analyst

一個月內

陶俊良 (Tao,Chun-Liang) Taipei, Taiwan Email: [email protected] Phone:I am very sensitive to data and enjoy finding inspiration and ideas from them. I am proficient in machine learning, text analysis, and recommendation systems, EVM blockchain analytics, and currently use Python as my primary programming languages. I am always open to learning new things, such as learning new data structure from blockchain. I am currently very interested in blockchain data and on-chain user segamentation. I was working in digital media, advertising (DSP, SSP, DMP platforms), gaming user analyst, blockchain

python

MySQL

全职 / 对远端工作有兴趣

4 到 6 年

臺灣大學

・

流行病學與預防醫學所生物統計組

仅开放给付费企业

智慧製造全端開發工程師 @聯華電子股份有限公司

・

2022 ~ 现在

AI工程師、機器學習工程師、深度學習工程師、影像演算法工程師、資料科學家、Ai Application Engineer,Machine Learning Engineer,Deep Learning Engineer,Data Scientist

一個月內

Python

Git

全职 / 对远端工作有兴趣

4 到 6 年

元智大學 Yuan Ze University

・

工業工程與管理學系所

升级以查看

仅开放给付费企业

曾任

博士後研究員 @洛桑大學神經發育疾病實驗室

・

2023 ~ 2023

Data Scientist, Data Analyst, Machine Learning Engineer

一個月內

Data Science

Data Analysis

Machine Learning

全职 / 对远端工作有兴趣

4 到 6 年

洛桑聯邦理工學院(EPFL)

・

神經科學

升级以查看

李慕全(MuChuan Li)

曾任

Service Provider @Taron Solutions Limited

・

2023 ~ 2023

AI工程師、機器學習工程師、電腦視覺工程師、資料科學家、Machine Learning Engineer、Computer Vision Engineer、Data Scientist

一個月內

李慕全(MuChuan Li) 畢業於國立臺北科技大學資工所，研究領域為深度學習、電腦視覺、及影像處理。在學期間致力於應用電腦視覺技術解決交通問題，擁有多項產學合作的專案開發經驗，亦在電腦視覺領域中發表過多篇學術論文，主要研究主題包含物

Machine Learning

Computer Vision

Pytorch/Tensorflow

全职 / 对远端工作有兴趣

國立臺北科技大學

・

資訊工程

仅开放给付费企业

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

一個月內

Python

Natural Language Processing (NLP)

全职 / 对远端工作有兴趣

4 到 6 年

國立政治大學（National Chengchi University）

・

資訊科學系

升级以查看

邱義塵

曾任

Data Engineer @Rooit Inc. (XO App)

・

2023 ~ 2023

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

一個月內

邱義塵於獨角獸多媒體設計有限公司擔任遊戲測試工程師一職建立公司測試團隊的測試流程和撰寫自動化測試程式 SDET、AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist 城市，TW [email protected] 工作經歷獨角獸多媒體

Python

Data Analysis

Data Science

全职 / 对远端工作有兴趣

中國醫藥大學(China Medical University)

・

臨床醫學研究所

Chun-Jung Huang

OPC Chief Engineer @TSMC

・

2020 ~ 现在

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

一個月內

Chun-Jung Huang [email protected] Chiao-Tung University, Ph.D. - Photonics,2015 ~ 2020 Member of The Phi Tau Phi Scholastic Honor Society of the Republic of China. Work Experience TSMC, OPC Chief Engineer (MarPresent) ◆Introduced image anomaly detection techniques to identify and address defects in photomask manufacturing, significantly improving product quality and reducing turnaround time. ◆Managed large-scale data processing tasks, demonstrating expertise in analyzing and handling datasets of hundreds of millions, to bolster model development and optimization. ◆Excelled in distributed computing, optimizing code execution across thousands of systems to

Deep learning with TensorFlow

Translational Research

Clinical Research

全职 / 对远端工作有兴趣

National Chiao-Tung University

・

Ph.D. - Clinical Engineering

梁賦康（Foo-Hong, Leong）

Product Manager @東元電機股份有限公司 (TECO Electric & Machinery Co. Ltd.)

・

2023 ~ 2023

Data Scientist, Data Analyst, Machine Learning Engineer

一個月內

梁賦康（Foo-Hong, Leong） Taoyuan City, Taiwan Email: [email protected] Tel:Skills • Languages: Python • DataBases: MySQL, SQLite • Infrastructure tools: Github • Machine learning libraries: TensorFlow, Keras, and Scikit-learn • Data visualization tools: Power BI, Seaborn and Matplotlib • Deployment: Streamlit Summary I have been working in Motor Manufacturing Industry for 8 years. My first programming was going to my Bachelor's degree, C++ was the first program I learned. Then I started to learn Python in 2018 at TEDU and my first project was the Stock Trend Prediction by CNN. I kept

Python

Power BI

Data Analytics

全职 / 对远端工作有兴趣

6 到 10 年

國立成功大學 National Cheng Kung University

・

Mechanical Engineering

陳奕妤

曾任

Senior Data Analyst @趨勢科技

・

2022 ~ 现在

Data Scientist, Data Analyst, Machine Learning Engineer

一個月內

customers by using statistical methods and machine learning methods. Developing automation regular reports, maintaining SQL store procedures, Tableau dashboards and Power BI dashboards. Cooperated with cross-functional team (Product, Marketing, Platform, PM, IT, Sales) to provide timely and accuracy business insight analysis. Developing automated web crawler on MMA website to collect ETF, fund, bond information. Skill : Microsoft SQL Server · Microsoft Power BI · Data Cubes · R · Python · Tableau · Web Crawling · machine learning · IMPALA · HIVE · Git · Docker Data Analyst • Catchplay AprOct 2020 Indonesia OTT customer profile analysis - Collecting, analyzing and evaluating data and campaign performa...

python

SQL

全职 / 对远端工作有兴趣

4 到 6 年

輔仁大學 Fu Jen Catholic University

・

統計資訊學系

江易倫

曾任

Career transition @Career Break

・

2024 ~ 2024

NLP Engineer / Data Scientist / Machine Learning Engineer

一個月內

江易倫 Data Scientist | Python | SQL | NLP | GenAI 具備5年以上程式撰寫能力，擅長Python、SQL與Linux 擅長資料清洗、分析與分類貼標具有自然語言處理與研究經驗大型語言模型LLM及生成式AI訓練與使用經驗 RAG技術使用與知識庫建立經驗過往研究專案中華電信智能標籤案

Python

SQL

NLP

全职 / 对远端工作有兴趣

4 到 6 年

National Chengchi University

・

資訊科學系

最轻量、快速的招募方案，数百家企业的选择

搜寻简历，主动联系求职者，提升招募效率。

浏览所有搜寻结果
每日可无限次数开启陌生对话
搜尋僅開放付費企業檢視的简历
检视使用者信箱 & 电话

立即升级

7 天内退款保证，可随时取消

1 2 3 4 5 6 7 8 9

搜寻技巧

Search a precise keyword combination

senior backend php

If the number of the search result is not enough, you can remove the less important keywords

Use quotes to search for an exact phrase

"business development"

Use the minus sign to eliminate results containing certain words

UI designer -UX

免费方案仅能搜寻公开简历。

升级至进阶方案，即可浏览所有搜寻结果（包含数万笔览仅在 CakeResume 平台上公开的简历）。

立即升级

职场能力评价定义

专业技能

该领域中具备哪些专业能力（例如熟悉 SEO 操作，且会使用相关工具）。

问题解决能力

能洞察、分析问题，并拟定方案有效解决问题。

变通能力

遇到突发事件能冷静应对，并随时调整专案、客户、技术的相对优先序。

沟通能力

有效传达个人想法，且愿意倾听他人意见并给予反馈。

时间管理能力

了解工作项目的优先顺序，有效运用时间，准时完成工作内容。

团队合作能力

具有向心力与团队责任感，愿意倾听他人意见并主动沟通协调。

领导力

专注于团队发展，有效引领团队采取行动，达成共同目标。

超過一年

呂學寬

HIPR Pacsoft Technologies

・

2020 ~ 2021

Taipei, 台灣

专业背景

目前状态

就职中

求职阶段

专业

数据科学家

产业

工作年资

小於 1 年

管理经历

无

技能

Python

C++

Java

tensorflow

语言能力

English

・

专业

Chinese

・

母语或双语

求职偏好

希望获得的职位

機器學習工程師

预期工作模式

全职

期望的工作地点

Taipei, 台灣, Kaohsiung City, 台灣

远端工作意愿

对远端工作有兴趣

接案服务

否

学历

学校

Tsinghua University

主修科系

Computer Science

列印

Hsuehkuan Lu

Machine Learning Engineer

Highly focused, cooperative, and with strong learning ability. Passionate about machine learning, natural language processing/understanding, and data science. Practical experiences in ML projects with Python, and GraphQL API design. Proficient in implementing algorithms and researching.

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

專注力強，樂於合作，學習能力強。對機器學習、自然語言處理/理解以及資料科學充滿熱情。擅長以Python做數據分析，實作機器學習專案，以及設計 GraphQL API。具備熟練實作算法模型以及研究能力。

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End

简历

个人档案

列印

Hsuehkuan Lu

Machine Learning Engineer

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End