Jobs
Job Search
Explore all available job openings across industries and locations.
Company Search
Find your dream jobs categorized by company names.
Themed Jobs
Discover job opportunities organized by specific themes or industries.
Download our App
Tools
Resume
Create your job-winning resume using our free resume builder.
Portfolio
Showcase your skills and projects with a professional portfolio.
Resume
Create your job-winning resume using our free resume builder.
Resume Builder
Make a resume for free.
Resume Templates
Access our extensive library of professional & ready-to-use templates.
Resume Examples
Get inspired by real resume examples to create your own.
Occupation Guide
Access resume writing guides tailored for different professions.
Resume Help
Get expert advice on all things resume from our team of recruitment specialists.
Portfolio
Showcase your skills and projects with a professional portfolio.
Portfolio Maker
Create a professional portfolio to highlight your skills and projects.
Portfolio Gallery
Browse through our collection of real portfolios for inspiration and networking.
Resources
Articles
Read insightful articles on career development, job search strategies, and more.
View All Articles
Job Search Guide
Resume & CV
Cover Letter
Portfolio
Interview Skills
Job Search Tips
Industry & Job Overview
Career Guidance
Career Planning
Career Tools
Career Development
Personal Branding
Success Stories
Success Stories
Business Excellence
People Operations
Recruitment & HR
About CakeResume
People & Culture
News & Updates
Events
Featured Reads
Resume & CV
What to Write in an Email When Sending a Resume [+ Examples & Tips]
Read More
Hire
Talent Search
Find Resumes.
Job Posting
Start for Free.
Recruitment Service
Acquire Talent.
Employer of Record (EOR)
Empower Your Business in Taiwan.
Employer Branding
Build and promote your employer brand.
Pricing
Job Posting Plans
Talent Search Plans
Resume Builder Plans
Build your Network
My Network
Access your personal network connections and manage your contacts.
CakeResume Meet
Expand your professional network by meeting and connecting with other users.
Community
Engage with other users through discussions, forums, and networking events.
Download our App

My Network

Access your personal network connections and manage your contacts.

CakeResume Meet

Expand your professional network by meeting and connecting with other users.

Community

Engage with other users through discussions, forums, and networking events.

CakeResume Talent Search

Advanced filters

Ready to interview

Open to opportunities

Not open to opportunities

Taiwan

台灣

Taipei City, Taiwan

Taipei, Taiwan

台北市, 台灣

New Taipei City, Taiwan

Taichung City, Taiwan

新北市, 台灣

United States

台中市, 台灣

India

Hsinchu County, Taiwan

Indonesia

Taoyuan City, Taiwan

Kaohsiung City, Taiwan

New Taipei, Taiwan

Tainan City, Taiwan

भारत

Hsinchu City, Taiwan

Vietnam

Management / Business

Engineering

Public Social Work

Education

Design

Sales

Customer Service

Game Production

Bio, Medical

Construction

Finance

Logistics / Trade

Marketing / Advertising

Law

Manufacturing

Media / Communication

Tech

Industry

Banking / Insurance / Finance

Medical

Consultant / Audit

Education / Training / Recruitment

Advertising / Marketing / Agency

Distribution

Culture / Media / Entertainment

Health / Social / Environment

Mobility / Transport

Public administration

Corporate services

Design / Art

Food and Beverage

Hotel / Tourism / Leisure

Legal / Law

Less than 1 year

1-2 years

2-4 years

4-6 years

6-10 years

10-15 years

More than 15 years

AI Smart Matching

National Taiwan University

國立台灣大學

國立臺灣大學

National Yang Ming Chiao Tung University

國立陽明交通大學

National Cheng Kung University

國立成功大學

National Central University

國立中央大學

National Chengchi University

National Taiwan University of Science and Technology

National Tsing Hua University

國立台灣科技大學

國立政治大學

國立清華大學

國立臺灣科技大學

National Sun Yat-sen University

國立中山大學

Feng Chia University

National Taipei University of Technology

國立台北科技大學

國立臺北科技大學

逢甲大學

Chung Yuan Christian University

National Chung Cheng University

中原大學

國立中正大學

National Dong Hwa University

National Taiwan Normal University

國立東華大學

Taiwan

台灣

Taipei City, Taiwan

台北市, 台灣

United States

New Taipei City, Taiwan

新北市, 台灣

Taichung City, Taiwan

Singapore

台中市, 台灣

Japan

Taoyuan City, Taiwan

日本

桃園市, 台灣

Hsinchu City, Taiwan

新竹市, 台灣

Australia

India

Canada

United Kingdom

Full-time

Part-time

Intern

Python

Machine Learning

docker

Git

Linux

Java

Excel

SQL

Deep Learning

JavaScript

Yes

1-5 people

5-10 people

10-15 people

15+ people

Within one month

Within two months

Within three months

Within six months

Within one year

More than one year

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Software Engineer

Data Scientist

軟體工程師

Senior Software Engineer

Manager

Data Scientist, Data Analyst, Machine Learning Engineer

Software Engineer / Backend Engineer

Data Engineer

Software Developer

Bachelor of Engineering (BEng)

Bachelor of Science (BS)

Bachelor’s Degree

Engineer’s Degree

Master of Business Administration (MBA)

Master of Science (MS)

Master’s Degree

Doctor of Philosophy (PhD)

Non-Degree Program (e.g. Coursera certificate)

Other

High school

Associate

Bachelor

Master

Doctoral

2023

2021

2019

2018

2017

2016

2015

2014

2013

2012

Current company

Off

Select all

Institute for Information Industry

TSMC

Academia Sinica

ASUS

Asus 華碩電腦股份有限公司

IBM

MediaTek Inc.

Micron Technology 台灣美光

NVIDIA

國立臺灣大學

Interested in working remotely

Not interested in working remotely

Remote Only

Full-time freelancer

Part-time freelancer

Non-freelancer

Chinese - Native or Bilingual

English - Intermediate

English - Fluent

English - Professional

English - Native or Bilingual

Japanese - Beginner

English - Beginner

Chinese - Fluent

French - Native or Bilingual

Japanese - Intermediate

English

Chinese

Vietnamese

French

Indonesian

Korean

4-6 years

6-10 years

10-15 years

More than 15 years

Exclude read results
Show all experiences

陶俊良

資料分析師 Data Analyst @Portto 門戶科技| Blocto

・

2022 ~ 2024

Data Analyst、Data Engineer、Data Scientist、Customer Experience Analyst

Within one month

陶俊良 (Tao,Chun-Liang) Taipei, Taiwan Email: [email protected] Phone:I am very sensitive to data and enjoy finding inspiration and ideas from them. I am proficient in machine learning, text analysis, and recommendation systems, EVM blockchain analytics, and currently use Python as my primary programming languages. I am always open to learning new things, such as learning new data structure from blockchain. I am currently very interested in blockchain data and on-chain user segamentation. I was working in digital media, advertising (DSP, SSP, DMP platforms), gaming user analyst, blockchain

python

MySQL

Full-time / Interested in working remotely

4-6 years

臺灣大學

・

流行病學與預防醫學所生物統計組

Available for paid companies

智慧製造全端開發工程師 @聯華電子股份有限公司

・

2022 ~ Present

AI工程師、機器學習工程師、深度學習工程師、影像演算法工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Within one month

Python

Git

Full-time / Interested in working remotely

4-6 years

元智大學

・

工業工程與管理學系所

Upgrade to View

Available for paid companies

Past

博士後研究員 @洛桑大學神經發育疾病實驗室

・

2023 ~ 2023

Data Scientist, Data Analyst, Machine Learning Engineer

Within one month

Data Science

Data Analysis

Machine Learning

Full-time / Interested in working remotely

4-6 years

洛桑聯邦理工學院(EPFL)

・

神經科學

Upgrade to View

李慕全(MuChuan Li)

Past

Service Provider @Taron Solutions Limited

・

2023 ~ 2023

AI工程師、機器學習工程師、電腦視覺工程師、資料科學家、Machine Learning Engineer、Computer Vision Engineer、Data Scientist

Within one month

李慕全(MuChuan Li) 畢業於國立臺北科技大學資工所，研究領域為深度學習、電腦視覺、及影像處理。在學期間致力於應用電腦視覺技術解決交通問題，擁有多項產學合作的專案開發經驗，亦在電腦視覺領域中發表過多篇學術論文，主要研究主題包含物

Machine Learning

Computer Vision

Pytorch/Tensorflow

Full-time / Interested in working remotely

國立臺北科技大學

・

資訊工程

Available for paid companies

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Within one month

Python

Natural Language Processing (NLP)

Full-time / Interested in working remotely

4-6 years

國立政治大學（National Chengchi University）

・

資訊科學系

Upgrade to View

邱義塵

Past

Data Engineer @Rooit Inc. (XO App)

・

2023 ~ 2023

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Within one month

邱義塵於獨角獸多媒體設計有限公司擔任遊戲測試工程師一職建立公司測試團隊的測試流程和撰寫自動化測試程式 SDET、AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist 城市，TW [email protected] 工作經歷獨角獸多媒體

Python

Data Analysis

Data Science

Full-time / Interested in working remotely

中國醫藥大學(China Medical University)

・

臨床醫學研究所

Chun-Jung Huang

OPC Chief Engineer @TSMC

・

2020 ~ Present

AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist

Within one month

Chun-Jung Huang [email protected] Chiao-Tung University, Ph.D. - Photonics,2015 ~ 2020 Member of The Phi Tau Phi Scholastic Honor Society of the Republic of China. Work Experience TSMC, OPC Chief Engineer (MarPresent) ◆Introduced image anomaly detection techniques to identify and address defects in photomask manufacturing, significantly improving product quality and reducing turnaround time. ◆Managed large-scale data processing tasks, demonstrating expertise in analyzing and handling datasets of hundreds of millions, to bolster model development and optimization. ◆Excelled in distributed computing, optimizing code execution across thousands of systems to

Deep learning with TensorFlow

Translational Research

Clinical Research

Full-time / Interested in working remotely

National Chiao-Tung University

・

Ph.D. - Clinical Engineering

梁賦康（Foo-Hong, Leong）

Product Manager @東元電機股份有限公司 (TECO Electric & Machinery Co. Ltd.)

・

2023 ~ 2023

Data Scientist, Data Analyst, Machine Learning Engineer

Within one month

梁賦康（Foo-Hong, Leong） Taoyuan City, Taiwan Email: [email protected] Tel:Skills • Languages: Python • DataBases: MySQL, SQLite • Infrastructure tools: Github • Machine learning libraries: TensorFlow, Keras, and Scikit-learn • Data visualization tools: Power BI, Seaborn and Matplotlib • Deployment: Streamlit Summary I have been working in Motor Manufacturing Industry for 8 years. My first programming was going to my Bachelor's degree, C++ was the first program I learned. Then I started to learn Python in 2018 at TEDU and my first project was the Stock Trend Prediction by CNN. I kept

Python

Power BI

Data Analytics

Full-time / Interested in working remotely

6-10 years

國立成功大學 National Cheng Kung University

・

Mechanical Engineering

陳奕妤

Past

Senior Data Analyst @趨勢科技

・

2022 ~ Present

Data Scientist, Data Analyst, Machine Learning Engineer

Within one month

customers by using statistical methods and machine learning methods. Developing automation regular reports, maintaining SQL store procedures, Tableau dashboards and Power BI dashboards. Cooperated with cross-functional team (Product, Marketing, Platform, PM, IT, Sales) to provide timely and accuracy business insight analysis. Developing automated web crawler on MMA website to collect ETF, fund, bond information. Skill : Microsoft SQL Server · Microsoft Power BI · Data Cubes · R · Python · Tableau · Web Crawling · machine learning · IMPALA · HIVE · Git · Docker Data Analyst • Catchplay AprOct 2020 Indonesia OTT customer profile analysis - Collecting, analyzing and evaluating data and campaign performa...

python

SQL

Full-time / Interested in working remotely

4-6 years

輔仁大學 Fu Jen Catholic University

・

統計資訊學系

江易倫

Past

Career transition @Career Break

・

2024 ~ 2024

NLP Engineer / Data Scientist / Machine Learning Engineer

Within one month

江易倫 Data Scientist | Python | SQL | NLP | GenAI 具備5年以上程式撰寫能力，擅長Python、SQL與Linux 擅長資料清洗、分析與分類貼標具有自然語言處理與研究經驗大型語言模型LLM及生成式AI訓練與使用經驗 RAG技術使用與知識庫建立經驗過往研究專案中華電信智能標籤案

Python

SQL

NLP

Full-time / Interested in working remotely

4-6 years

National Chengchi University

・

資訊科學系

The Most Lightweight and Effective Recruiting Plan

Search resumes and take the initiative to contact job applicants for higher recruiting efficiency. The Choice of Hundreds of Companies.

Browse all search results
Unlimited access to start new conversations
Resumes accessible for only paid companies
View users’ email address & phone numbers

Upgrade Now

7-day money-back guarantee, cancel anytime

1 2 3 4 5 6 7 8 9

Search Tips

Search a precise keyword combination

senior backend php

If the number of the search result is not enough, you can remove the less important keywords

Use quotes to search for an exact phrase

"business development"

Use the minus sign to eliminate results containing certain words

UI designer -UX

Only public resumes are available with the free plan.

Upgrade to an advanced plan to view all search results including tens of thousands of resumes exclusive on CakeResume.

Upgrade Now

Definition of Reputation Credits

Technical Skills

Specialized knowledge and expertise within the profession (e.g. familiar with SEO and use of related tools).

Problem-Solving

Ability to identify, analyze, and prepare solutions to problems.

Adaptability

Ability to navigate unexpected situations; and keep up with shifting priorities, projects, clients, and technology.

Communication

Ability to convey information effectively and is willing to give and receive feedback.

Time Management

Ability to prioritize tasks based on importance; and have them completed within the assigned timeline.

Teamwork

Ability to work cooperatively, communicate effectively, and anticipate each other's demands, resulting in coordinated collective action.

Leadership

Ability to coach, guide, and inspire a team to achieve a shared goal or outcome effectively.

More than one year

呂學寬

HIPR Pacsoft Technologies

・

2020 ~ 2021

Taipei, 台灣

Professional Background

Current status

Employed

Job Search Progress

Professions

Data Scientist

Fields of Employment

Work experience

Less than 1 year

Management

None

Skills

Python

C++

Java

tensorflow

Languages

English

・

Professional

Chinese

・

Native or Bilingual

Job search preferences

Positions

機器學習工程師

Job types

Full-time

Locations

Taipei, 台灣, Kaohsiung City, 台灣

Remote

Interested in working remotely

Freelance

Educations

School

Tsinghua University

Major

Computer Science

Hsuehkuan Lu

Machine Learning Engineer

Highly focused, cooperative, and with strong learning ability. Passionate about machine learning, natural language processing/understanding, and data science. Practical experiences in ML projects with Python, and GraphQL API design. Proficient in implementing algorithms and researching.

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

專注力強，樂於合作，學習能力強。對機器學習、自然語言處理/理解以及資料科學充滿熱情。擅長以Python做數據分析，實作機器學習專案，以及設計 GraphQL API。具備熟練實作算法模型以及研究能力。

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End

Resume

Profile

Hsuehkuan Lu

Machine Learning Engineer

Taipei, Taiwan

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer • HIPR PacSoft Technologies

August 2020 - Present

Model MySQL 30+ tables, and Elasticsearch 10+ indices.
Design GraphQL APIs with Flask, and write unit tests.
Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

Develop social network applications for academic researchers.
Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
Main backend APIs developer in the project (70%+).

News Miner

Developed news centering and trend/topic analysis applications.
Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java

Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

Taipei, Taiwan

Email: [email protected]

Phone: (+886) 905193233

工作經驗

算法工程師 • HIPR PacSoft Technologies

八月 2020 - Present

MySQL 30+表架構模型建立，以及 Elasticsearch 10+索引架構模型建立。
利用 GraphQL + Flask 設計後端 API，並撰寫單元測試。
自然語言處理資料處理流程設計（合計約5億筆資料），包含Google Scholar、世界大學排名、期刊排名，以及學術機構排名資料。
設計 Dask 分布式計算、爬蟲（16+ workers），以及 Asyncio 非同步資料處理流程（約提升3-4倍效能）。

專案

Impactio

針對學術研究者設計的社交網絡。
設計大量資料處理流程：學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法，以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
設計 GraphQL cursor-based 分頁方法，優化內建 offset-based 分頁方法，效率提升約 10%+。
專案後端 API 服務的主要開發者 (70%+)。

News Miner

開發跨新聞媒體及趨勢、主題分析應用。
設計英文新聞的資料處理流程 (每日5K+)，以及新聞主題聚類方法設計、文本語義表示學習。
採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵，並設計單向聚類方法合併相似新聞。
系統平均 10 分鐘能處理 100K+ 筆新聞，而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

結合詞性標註到依存句法任務中，以降低彼此任務間的錯誤信息傳遞，並增強跨任務間的結構信息。
在 Universal Dependency 2.0 的數據集上，我們提出的方法達到 81.14 LAS (多語言平均)，baseline 為 72.14 LAS，提出方法提升約 11%。
對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
標注系統由 Tensorflow + Python 實作，處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

解決跨語言方法對於平行語料的依賴，並嘗試結合知識實體到文本中以增強文本信息。
設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示，並公開實驗訓練出的詞向量 (300d)。
在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r，對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
提出跨語言信息檢索任務，我們的方法達到 80% Top-10 Accuracy，對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java

框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab
Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表

Hsuehkuan Lu, Yixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End