CakeResume Talent Search

Advanced filters
On
De 4 a 6 años
6-10 años
10-15 años
Más de 15 años
Avatar of the user.
Avatar of the user.
軟體工程師 @Wistron NeWeb Corporation 啟碁科技股份有限公司
2023 ~ 2023
軟體工程師
En el plazo de dos meses
Word
PowerPoint
Excel
Estudiando
Abierto a oportunidades
A tiempo completo / Interesado en trabajar a distancia
De 4 a 6 años
國立中正大學(National Chung Cheng University)
Computer Science and Information Engineering
Avatar of 陳昱希.
Avatar of 陳昱希.
Computer Vision Engineer @Academia Sinica
2015 ~ Presente
Computer Vision Engineer
En un mes
陳昱希 Computer Vision Engineer Yu-Hsi Chen has rich experience in developing computer vision and machine learning algorithms. In his recent work at Academia Sinica, he has focused on using machine learning to solve traditional computer vision and image / video processing problems. His developed NeighborTrack is a state-of-the-art single object tracking system in the field. During his school days, he used verilog on FPGA to implement the 3A system of the camera. website: Yu-hsi Chen (franktpmvu.github.io) Taipei City, Taiwan Yu-hsi Chen (franktpmvu.github
Provides Feedback
Communication
Precision
Abierto a oportunidades
A tiempo completo / Interesado en trabajar a distancia
6-10 años
LUNGHWA university
Master of Science
Avatar of the user.
Avatar of the user.
Blockchain Enginner & AI Lead @Portal Network
Blockchain engineer & Blockchain consulting
En un mes
Solidity
blockchain development
Docker
A tiempo completo / Interesado en trabajar a distancia
De 4 a 6 años
Avatar of DboyLiao.
Avatar of DboyLiao.
Principal Engineer @Coretronic Corporation, 中強光電
2020 ~ 2022
Machine Learning Engineer
En un mes
Senior Software Developer, Wuduker Inc., JanuaryOctober 2020 iSchedule, a preference aware scheduling system. Responsible for designing RESTful API, database schema and core scheduling solver Anomaly detection system with Deep Metric Learning Develop deep learning model with PyTroch and PyTorch-Lightning Consulting service. Including Spark pipeline optimization, deep learning model development and general Python/C++ development Machine Learning Engineer, Pinkoi Inc., DecemberDecember 2019 Recommendation System, including item-based/store-based recommendation, keyword suggestion, making significant improvement on recommendation quality and coverage. Machine Learning Algorithm Design Data Pipeline, including on-site advertising and
Python
Linux
C++
Empleado
A tiempo completo / Interesado en trabajar a distancia
6-10 años
國立台灣大學
經濟學
Avatar of YEN-TING CHEN.
Avatar of YEN-TING CHEN.
Research Assistant of National Taiwan University @National Taiwan University
2023 ~ Presente
Graduate research assistant
En seis meses
YEN-TING CHEN (陳彥廷) I am a graduate student in the Department of Psychology at National Taiwan University ( NTU). For me, diving into psychometrics and exploring data with reasonable statistical method is to clarify a new world of understanding people around us. Whether it's a quirky little issue or a big, serious one, I've got curiosity and grabbed my attention to figure out problems using the tools or theories of psychometrics and data analysis . Right now, I am turning curiosity into discoveries in the wild world of Psychology and All kinds of Data
EDA
Python Programming
R Programming
Estudiando
A tiempo parcial / Interesado en trabajar a distancia
De 4 a 6 años
National Taiwan University
Psychometrics (Division of Psychology), Methodology (Division of Psychology)
Avatar of Benjamin Deporte.
Avatar of Benjamin Deporte.
AI, Machine Learning and Data Manager @IRT Saint Exupery
2021 ~ Presente
Data Analyst、Data Scientist、AI Engineer、Project Manager
En un mes
Benjamin Deporte [email protected] AI, Machine Learning and Data Officer Innovative AI/ML seasoned leader with strong mathematical background and hands-on knowledge of machine learning algorithms and best practices. Specialized in Cybersecurity, Healthcare and Aerospace. Demonstrated driving business value through 10+ years of experience within different businesses, in direct management or thought leadership roles. Leadership, networking, communication and language skills. Skills Expertise in Artificial Intelligence and Machine Learning Specialization in Cybersecurity, Healthcare and Aerospace. Leadership abilities, networking and communication skills Business acumen in multicultural, global organizations Work
Proficiency in Artificial Intelligence and Machine Learning
Knowledgeable in cybersecurity
Project and Account management
Empleado
A tiempo completo / Interesado en trabajar a distancia
6-10 años
Télécom Paris
Cybersecurity
Avatar of NengChien Wang.
Avatar of NengChien Wang.
Past
Senior Software Engineer @DOINT
2021 ~ 2023
Software engineer, Image Processing engineer, Algorithm engineer
En un plazo de tres meses
Docker Doxygen Swig (API for python from C++) OpenCV OpenCL Linux Skill multi-threads distributed computing serialization/deserialization CICD data version control (DVC) MLOps Image Processing PCA Interactive Segmentation Connected Component Image Stitching Direct Linear Transformation Kalman Filter (Tracking) SIFT Hough HoG Image Deblurring Depth Estimation ISP Machine Learning Framework Transformation Data Augmentation Transfer Learning Model Pruning Model Quantization Performance Evaluation Parameter Fine-tuning Model : SVM, LeNet, AlexNet, VGG, GoogLeNet, SSD, YOLO, MobileNet, ShuffleNet, FaceNet, Xception, MatrixNet, CenterNet, CSPNet, M2Det, EfficientNet/Det Projects HPC Keyword Spotting Automatic Speech Recognition Distributed Inference System Social Distancing Estimation (Lidar
Python
Machine Learning
C++
Desempleado
A tiempo completo / Interesado en trabajar a distancia
6-10 años
National Taiwan University
Communication Engineering
Avatar of Eddy Chen.
Avatar of Eddy Chen.
機器學習工程師 @日新軟體股份有限公司
2021 ~ Presente
AI工程師、機器學習工程師、深度學習工程師、資料科學家、Machine Learning Engineer、Deep Learning Engineer、Data Scientist
En seis meses
focus mainly on object detection, sensor fusion and sensor calibration. 3+ years of experience in machine learning. Used to conduct research in medical AI projects and possess experience in implement real-world end-to-end ML project . Skills Programming Languages Python C++ SQL Machine Learning TensorFlow Pytorch Scikit-learn Matplotlib Others FastAPI Docker Redis Linux Certification Taiwan AI Academy - AI Technical Professionals Program NVIDIA DLI Certificate – Applications of AI for Anomaly Detection Work Experience Machine Learning Engineer NEUTEC • MayPresent Develop, optimize and maintain the machine learning algorithm for internal process automation. Deploy machine
AI & Machine Learning
Image Processing
python
Empleado
A tiempo completo / Interesado en trabajar a distancia
De 4 a 6 años
國立臺北科技大學
機電整合所
Avatar of Kishan Gondaliya.
Avatar of Kishan Gondaliya.
AI & Embedded Systems Consultant @Self Employed
2021 ~ Presente
Deep Learning Engineer
Más de un año
Kishan Gondaliya Experienced embedded software engineer working on Embedded Systems and Deep Learning to enable vision and voice-based machine learning algorithms on low-power FPGA and edge embedded devices. ~8 years of experience consists in writing, debugging, and optimizing software/firmware for embedded [email protected] Ahmedabad, Gujarat, India Skillset Languages: Frameworks: Dev Tools: HW Platform: Cloud (GCP): Cloud (AWS): Other: C, Python, C++ Tensorflow (TFlite, TFmicro), Keras, Caffe, Darknet Anaconda, Git, Gerrit, Perforce, Pycharm, CVS, Jira, Confluence Google Coral TPU, Lattice ECP5, U+, Crosslin-NX FPGA, Raspberry Pi, Intel Movidius, NVIDIA
Deep Learning
machine learning
aws
Empleado
A tiempo completo / Interesado en trabajar a distancia
De 4 a 6 años
Charotar University of Science & Technology
Electronics & Communication
Avatar of the user.
Avatar of the user.
Data Scientist | Associate Researcher @China Engineering Consultants, Inc.
2020 ~ Presente
資料分析師
Más de un año
Python
SQL/MySQL
SQL Server
Empleado
A tiempo completo / Interesado en trabajar a distancia
6-10 años
國立東華大學(National Dong Hwa University)
應用數學研究所

El plan de reclutamiento más ligero y eficaz

Busque currículums y tome la iniciativa de ponerse en contacto con los solicitantes de empleo para lograr una mayor eficacia en la contratación. La elección de cientos de empresas.

  • Examinar todos los resultados de la búsqueda
  • Acceso ilimitado para iniciar nuevas conversaciones
  • currículos accesibles sólo para empresas de pago
  • Ver dirección de correo electrónico y números de teléfono de los usuarios
Consejos de búsqueda
1
Search a precise keyword combination
senior backend php
If the number of the search result is not enough, you can remove the less important keywords
2
Use quotes to search for an exact phrase
"business development"
3
Use the minus sign to eliminate results containing certain words
UI designer -UX
Sólo los currículums públicos están disponibles con el plan gratuito.
Actualiza a un plan avanzado para ver todos los resultados de la búsqueda incluyendo decenas de miles de currículums exclusivos en CakeResume.

Definition of Reputation Credits

Technical Skills
Specialized knowledge and expertise within the profession (e.g. familiar with SEO and use of related tools).
Problem-Solving
Ability to identify, analyze, and prepare solutions to problems.
Adaptability
Ability to navigate unexpected situations; and keep up with shifting priorities, projects, clients, and technology.
Communication
Ability to convey information effectively and is willing to give and receive feedback.
Time Management
Ability to prioritize tasks based on importance; and have them completed within the assigned timeline.
Teamwork
Ability to work cooperatively, communicate effectively, and anticipate each other's demands, resulting in coordinated collective action.
Leadership
Ability to coach, guide, and inspire a team to achieve a shared goal or outcome effectively.
Más de un año
Logo of HIPR Pacsoft Technologies.
HIPR Pacsoft Technologies
2020 ~ 2021
Taipei, 台灣
Professional Background
Situación actual
Empleado
Progreso en la búsqueda de empleo
Professions
Data Scientist
Fields of Employment
Experiencia laboral
Menos de 1 año
Management
Ninguno
Habilidades
Python
C++
Java
tensorflow
Idiomas
English
Profesional
Chinese
Nativo o bilingüe
Job search preferences
Posición
機器學習工程師
Tipo de trabajo
A tiempo completo
Ubicación
Taipei, 台灣, Kaohsiung City, 台灣
A distancia
Interesado en trabajar a distancia
Freelance
No.
Educación
Escuela
Tsinghua University
Mayor
Computer Science
Imprimir

Hsuehkuan Lu

Machine Learning Engineer

Highly focused, cooperative, and with strong learning ability. Passionate about machine learning, natural language processing/understanding, and data science. Practical experiences in ML projects with Python, and GraphQL API design. Proficient in implementing algorithms and researching. 

  Taipei, Taiwan      

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer  •  HIPR PacSoft Technologies

August 2020 - Present

  • Model MySQL 30+ tables, and Elasticsearch 10+ indices.
  • Design GraphQL APIs with Flask, and write unit tests.
  • Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
  • Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
  • Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

  • Develop social network applications for academic researchers.
  • Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
  • Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
  • Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
  • Main backend APIs developer in the project (70%+).

News Miner

  • Developed news centering and trend/topic analysis applications.
  • Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
  • Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
  • Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

  • Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
  • Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
  • The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
  • Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

  • Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
  • Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
  • Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
  • Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
  • Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java


Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab

Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications


  1. Hsuehkuan LuYixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
  2. Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

專注力強,樂於合作,學習能力強。對機器學習、自然語言處理/理解以及資料科學充滿熱情。擅長以Python做數據分析,實作機器學習專案,以及設計 GraphQL API。具備熟練實作算法模型以及研究能力。

  Taipei, Taiwan      

Email: [email protected] 

Phone: (+886) 905193233

工作經驗

算法工程師  •  HIPR PacSoft Technologies

八月 2020 - Present

  • MySQL 30+表架構模型建立,以及 Elasticsearch 10+索引架構模型建立。
  • 利用 GraphQL + Flask 設計後端 API,並撰寫單元測試。
  • 自然語言處理資料處理流程設計(合計約5億筆資料),包含Google Scholar、世界大學排名、期刊排名,以及學術機構排名資料。
  • 設計 Dask 分布式計算、爬蟲(16+ workers),以及 Asyncio 非同步資料處理流程(約提升3-4倍效能)。

專案

Impactio

  • 針對學術研究者設計的社交網絡。
  • 設計大量資料處理流程:學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
  • 利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法,以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
  • 設計 GraphQL cursor-based 分頁方法,優化內建 offset-based 分頁方法,效率提升約 10%+。
  • 專案後端 API 服務的主要開發者 (70%+)。

News Miner

  • 開發跨新聞媒體及趨勢、主題分析應用。
  • 設計英文新聞的資料處理流程 (每日5K+),以及新聞主題聚類方法設計、文本語義表示學習。
  • 採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵,並設計單向聚類方法合併相似新聞。
  • 系統平均 10 分鐘能處理 100K+ 筆新聞,而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

  • 結合詞性標註到依存句法任務中,以降低彼此任務間的錯誤信息傳遞,並增強跨任務間的結構信息。
  • 在 Universal Dependency 2.0 的數據集上,我們提出的方法達到 81.14 LAS (多語言平均),baseline 為 72.14 LAS,提出方法提升約 11%。
  • 對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
  • 標注系統由 Tensorflow + Python 實作,處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

  • 解決跨語言方法對於平行語料的依賴,並嘗試結合知識實體到文本中以增強文本信息。
  • 設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
  • 算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示,並公開實驗訓練出的詞向量 (300d)。
  • 在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r,對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
  • 提出跨語言信息檢索任務,我們的方法達到 80% Top-10 Accuracy,對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java


框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab

Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表


  1. Hsuehkuan LuYixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
  2. Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End

Resume
Perfil

Hsuehkuan Lu

Machine Learning Engineer

Highly focused, cooperative, and with strong learning ability. Passionate about machine learning, natural language processing/understanding, and data science. Practical experiences in ML projects with Python, and GraphQL API design. Proficient in implementing algorithms and researching. 

  Taipei, Taiwan      

[email protected]

Education

2016 - 2019

Tsinghua University

Master of Science, Computer Science

Knowledge Engineering Group (KEG) Lab

2012 - 2016

National Central University

Bachelor of Science, Computer Science and Information Engineering

Work Experience

Algorithm Engineer  •  HIPR PacSoft Technologies

August 2020 - Present

  • Model MySQL 30+ tables, and Elasticsearch 10+ indices.
  • Design GraphQL APIs with Flask, and write unit tests.
  • Design NLP data processing pipeline (500M+ data) ranges from Google Scholar, world university rankings, journal rankings to institution rankings.
  • Design distributed crawling and computing systems with Dask, and asynchronous data processing methods (3-4x faster).
  • Import object-relational mapping of Elasticsearch, improving the organization of search engine indices mappings, and simplifying queries.

Projects

Impactio

  • Develop social network applications for academic researchers.
  • Design large data pipeline for institutions (750K+), journals (35K+), authors (250M+), and publications (230M+).
  • Design distributed crawling and on-the-fly merging systems for Google Scholar profile with Dask (avg. 3-5 MIN per user).
  • Design cursor-based pagination method with GraphQL to improve built-in offset-based pagination method by 10%+ efficiency.
  • Main backend APIs developer in the project (70%+).

News Miner

  • Developed news centering and trend/topic analysis applications.
  • Designed English news data processing pipeline (5K+ per day), and topical news clustering with semantic representations.
  • Adopted Word2Vec distributed representations with k-means clustering to produce text features, and designed a single-pass clustering method to merge related news.
  • Handled 100K+ news in 10 MIN, largely improved the processing performance by 250%-300% comparing to k-means clustering.

POS Tagging and Dependency Parsing

  • Combined POS tagging with dependency parsing so as to alleviate the error propagation problems and enriched sequence tagging information across tasks.
  • Experimental results on dataset Universal Dependency 2.0 achieved 81.14 with LAS (Labelled Attachment Score) averagely, while baseline scored 72.14 with LAS (11% improvement).
  • The ablation tests indicated that the POS tagging information largely improved the performance of dependency parsing by 10%.
  • Tagging system was implemented in Tensorflow with Python, and was able to annotate 1K+ sentences per MIN.

Wikipedia Data Processing and Joint Representations of Entities and Texts

  • Tackled the parallel corpus reliance on the cross-lingual problems and enriched textual information with external knowledge information.
  • Designed a weakly-supervised algorithm to produce aligned cross-lingual corpus (avg. 300K+ paragraphs per language pair).
  • Jointly modeled knowledge entities and texts with the model derived from Skip-Gram with Negative Sampling algorithm and made word vectors (300d) public.
  • Achieved 44.99 Pearson-r in SemEval-2017 Track 4a task (En-Es), while end-to-end LASER model achieved 40.87 Pearson-r (10% improvement).
  • Proposed a cross-lingual information retrievement task, and our proposed method achieved 80% Top-10 Accuracy, while the strongest baseline acquired 61% Top-10 Accuracy (30% significant improvement).

Skills

Programming Languages

Python
C++
Java


Framework and Data Analysis

Tensorflow
PyTorch
Pandas
Jupyter Lab

Dask
GraphQL

Database and Search Engine

MySQL
Elasticsearch
MongoDB

VCS and CI/CD Tools

Git
Bitbucket
Alembic
Jenkins

Publications


  1. Hsuehkuan LuYixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
  2. Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

呂學寬

機器學習工程師

Machine Learning Engineer

專注力強,樂於合作,學習能力強。對機器學習、自然語言處理/理解以及資料科學充滿熱情。擅長以Python做數據分析,實作機器學習專案,以及設計 GraphQL API。具備熟練實作算法模型以及研究能力。

  Taipei, Taiwan      

Email: [email protected] 

Phone: (+886) 905193233

工作經驗

算法工程師  •  HIPR PacSoft Technologies

八月 2020 - Present

  • MySQL 30+表架構模型建立,以及 Elasticsearch 10+索引架構模型建立。
  • 利用 GraphQL + Flask 設計後端 API,並撰寫單元測試。
  • 自然語言處理資料處理流程設計(合計約5億筆資料),包含Google Scholar、世界大學排名、期刊排名,以及學術機構排名資料。
  • 設計 Dask 分布式計算、爬蟲(16+ workers),以及 Asyncio 非同步資料處理流程(約提升3-4倍效能)。

專案

Impactio

  • 針對學術研究者設計的社交網絡。
  • 設計大量資料處理流程:學術機構 (750K+)、期刊 (35K+)、作者 (250M+) 以及學術論文 (230M+)。
  • 利用 Dask 設計 Google Scholar 使用者資料分布式爬蟲方法,以及實時合併處理系統 (平均每個使用者 3-5 MIN)。
  • 設計 GraphQL cursor-based 分頁方法,優化內建 offset-based 分頁方法,效率提升約 10%+。
  • 專案後端 API 服務的主要開發者 (70%+)。

News Miner

  • 開發跨新聞媒體及趨勢、主題分析應用。
  • 設計英文新聞的資料處理流程 (每日5K+),以及新聞主題聚類方法設計、文本語義表示學習。
  • 採用 Word2Vec 詞向量 + k-means 聚類方法生成文本特徵,並設計單向聚類方法合併相似新聞。
  • 系統平均 10 分鐘能處理 100K+ 筆新聞,而採用 k-means 方法則需要 10x 以上的時間才能達到收斂。

詞性標註與依存句法分析

  • 結合詞性標註到依存句法任務中,以降低彼此任務間的錯誤信息傳遞,並增強跨任務間的結構信息。
  • 在 Universal Dependency 2.0 的數據集上,我們提出的方法達到 81.14 LAS (多語言平均),baseline 為 72.14 LAS,提出方法提升約 11%。
  • 對照實驗顯示詞性信息對於依存句法分析的預測有約 10%+的影響。
  • 標注系統由 Tensorflow + Python 實作,處理速度約每分鐘1K+。

維基百科數據處理及聯合知識文本表示學習

  • 解決跨語言方法對於平行語料的依賴,並嘗試結合知識實體到文本中以增強文本信息。
  • 設計弱監督算法生成維基百科跨語言語料 (平均每語言對 300K+ 段落)。
  • 算法基於負採樣 Skip-Gram 算法聯合學習知識實體、文本表示,並公開實驗訓練出的詞向量 (300d)。
  • 在 SemEval-2017 Track 4a 跨語言 (En-Es) 文本語義相似度任務中達到 44.99 Pearson-r,對比端到端模型 LASER 40.87 Pearson-r 提升近 10%。
  • 提出跨語言信息檢索任務,我們的方法達到 80% Top-10 Accuracy,對比 baseline 實驗 61% 有大幅提升 (30%)。

技能

程式語言

Python
C++
Java


框架及資料分析工具

Tensorflow
PyTorch
Pandas
Jupyter Lab

Dask
GraphQL

資料庫及搜尋引擎

MySQL
Elasticsearch
MongoDB

版控及CI/CD工具

Git
Bitbucket
Alembic
Jenkins

論文發表


  1. Hsuehkuan LuYixin Cao, Lei Hou, and Juanzi Li. Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity. International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE), 2019. CCIS Volume 1058, pages 425-440.
  2. Hsuehkuan Lu, Lei Hou, and Juanzi Li. How Important Is POS to Dependency Parsing? Joint POS Tagging and Dependency Parsing Neural Networks. Chinese Computational Lingustics (CCL), 2019. LNCS, Volume 11856, pages 625-637.

End