SUMMARY
Aiming to be a data scientist unicorn who excels at applied machine learning in a business/finance-related field.
DevOps Engineer(Python) for Automatic Claim Processor (ACP) - OCR System (Hospital Diagnosis/Receipt).
Applying tree-based algorithms to model Credit Scoring predictions in the finance industry.
Achieving Hyperautomation through AI and RPA, which uses a special set of tools to automate tasks.
Nov. 2022 -
Now
AI Team
DevOps Engineer(Python) for Automatic Claim Processor (ACP) - OCR System (Hospital Diagnosis/Receipt)
1. Receipt Recognition Service (2023) OCR:
Successfully improved OCR model accuracy from 80% to 96% in 2023 by integrating current model with Microsoft Azure Form Recognizer output.
Seamlessly integrated Microsoft Azure services into our existing Receipt system's data pipeline.
Engineered postprocessing data solutions to cater to diverse format requirements from 17 different hospitals.
Designed and executed comprehensive unit tests to ensure robustness and reliability.
2. Diagnosis Recognition Service (2022 - 2023) NLP/OCR:
Continuously maintain and monitor the Diagnosis Recognition service, ensuring optimal performance.
Regularly update the synonym table to align with real-world Diagnosis cases.
Conduct rigorous pytest to guarantee the stability of all deployments in the production environment.
July 2020 -
Sept.
2022
Intelligent Banking Division(智能金融處)
Building Machine Learning models to apply risk assessment in banking.
Specialized in credit card and Join Credit Information Center(JCIC) data.
ETL data and construct data pipelines for retraining models/in production.
Working in RPA(Robotic Process Automation) team, Web Scraping(crawler) and automating routine tasks to achieve labor cost reduction.
June 2019 - May 2020
Digital data & Technology (DDT, 數數發)
Research into Interpretable Machine Learning and its existing algorithms. Experimented LIME & SHAP on open data. (GitHub)
Real Estate Evaluation model – Geographical/Credit Card data gathering, cleaning, feature engineering (Hit-rate performance improved from 55% to 70%)
Aiming to reduce risks from capitals through more accurate models by following IRB method.
Applied tree-based methods to produce pd/lgd/ead predictions for computing expected credit loss.
Saved more than 50 billion in capital through reducing the capital requirement for Capital Adequacy Ratio (CAR).
A wide variety of storage methods by individual departments causes difficult time exploring data, and unnecessary duplication of effort on different projects.
Aiming to make an united database and features by extracting and integrating data from various sources through data munging.
Integrated various data sources, resulting in 99% consistency and 80% less effort on data preprocessing.
Monitoring Data Pipelines through Apache Airflow(Refactoring ETL codes with DAGs)
Fetch all Taiwan address from Dept. of Household Registration through Selenium, using doorplate number.
Web crawling government TGOS website to get latitude and longitude from address.
1. Automating Bank Trust Dept. AS400 system routine tasks through Pywinauto, saving 8 labor hours/week.
2. Automating Asset Management Dept. routine Excel and PDF tasks through Tabula, saving 16 labor hours/week.
2018 - 2020
2012 - 2016
• 89/1366
E-Sun Credit Card Default Detection(玉山人工智慧公開挑戰賽-信用卡盜刷偵測)
• 7/86
Taishin Financial Product Purchase Prediction(第二屆商業模式與大數據分析競賽 台新銀行)
• 685/2281(public)
Kaggle Deepfake Detection Challenge
• TOEFL 96/120 (on 2020)
SUMMARY
Aiming to be a data scientist unicorn who excels at applied machine learning in a business/finance-related field.
DevOps Engineer(Python) for Automatic Claim Processor (ACP) - OCR System (Hospital Diagnosis/Receipt).
Applying tree-based algorithms to model Credit Scoring predictions in the finance industry.
Achieving Hyperautomation through AI and RPA, which uses a special set of tools to automate tasks.
Nov. 2022 -
Now
AI Team
DevOps Engineer(Python) for Automatic Claim Processor (ACP) - OCR System (Hospital Diagnosis/Receipt)
1. Receipt Recognition Service (2023) OCR:
Successfully improved OCR model accuracy from 80% to 96% in 2023 by integrating current model with Microsoft Azure Form Recognizer output.
Seamlessly integrated Microsoft Azure services into our existing Receipt system's data pipeline.
Engineered postprocessing data solutions to cater to diverse format requirements from 17 different hospitals.
Designed and executed comprehensive unit tests to ensure robustness and reliability.
2. Diagnosis Recognition Service (2022 - 2023) NLP/OCR:
Continuously maintain and monitor the Diagnosis Recognition service, ensuring optimal performance.
Regularly update the synonym table to align with real-world Diagnosis cases.
Conduct rigorous pytest to guarantee the stability of all deployments in the production environment.
July 2020 -
Sept.
2022
Intelligent Banking Division(智能金融處)
Building Machine Learning models to apply risk assessment in banking.
Specialized in credit card and Join Credit Information Center(JCIC) data.
ETL data and construct data pipelines for retraining models/in production.
Working in RPA(Robotic Process Automation) team, Web Scraping(crawler) and automating routine tasks to achieve labor cost reduction.
June 2019 - May 2020
Digital data & Technology (DDT, 數數發)
Research into Interpretable Machine Learning and its existing algorithms. Experimented LIME & SHAP on open data. (GitHub)
Real Estate Evaluation model – Geographical/Credit Card data gathering, cleaning, feature engineering (Hit-rate performance improved from 55% to 70%)
Aiming to reduce risks from capitals through more accurate models by following IRB method.
Applied tree-based methods to produce pd/lgd/ead predictions for computing expected credit loss.
Saved more than 50 billion in capital through reducing the capital requirement for Capital Adequacy Ratio (CAR).
A wide variety of storage methods by individual departments causes difficult time exploring data, and unnecessary duplication of effort on different projects.
Aiming to make an united database and features by extracting and integrating data from various sources through data munging.
Integrated various data sources, resulting in 99% consistency and 80% less effort on data preprocessing.
Monitoring Data Pipelines through Apache Airflow(Refactoring ETL codes with DAGs)
Fetch all Taiwan address from Dept. of Household Registration through Selenium, using doorplate number.
Web crawling government TGOS website to get latitude and longitude from address.
1. Automating Bank Trust Dept. AS400 system routine tasks through Pywinauto, saving 8 labor hours/week.
2. Automating Asset Management Dept. routine Excel and PDF tasks through Tabula, saving 16 labor hours/week.
2018 - 2020
2012 - 2016
• 89/1366
E-Sun Credit Card Default Detection(玉山人工智慧公開挑戰賽-信用卡盜刷偵測)
• 7/86
Taishin Financial Product Purchase Prediction(第二屆商業模式與大數據分析競賽 台新銀行)
• 685/2281(public)
Kaggle Deepfake Detection Challenge
• TOEFL 96/120 (on 2020)