Connor Hsu

Curious about data and real world, building product to solve problem, making machine learning into product, writing is my interest.

Summary

9 years experience of large scale AI product building, and is capable of building product from scratch.
Extensive problem solving experience for data science/engineering, and familiar with transferring real problem into requirements and solution planning.
A well-rounded engineer in data project who bridges the gap between scientists and engineers.
A pragmatic and ownership driven person, experienced with gap analysis, migration plan and release management.
Leader of documentation and process, mentor of junior engineers.
Product-driven mindset, learning new tech. and use them to build side projects continuously (chatbot, blockchain, ..., etc.)

Expertise fields:

ML Ops / Data Engineering: This is my strength.
Data Science/Modeling: Not using the cutting edge technologies in the work, but it can be swiftly brought up to speed. (it's my master degree).

Experience

Data Infra Engineer, Moneytree, Nov. 2020 - present (~3y)

Senior Software Engineer I, Nov. 2023 ~

Software Engineer II, ~ Nov. 2023

Built a Data Lakehouse solution on a full cloud environment (AWS).

Involved with 3 new data products which made initial revenue stream for a new team with only 2 engineers.

Lead a cross team initiative to democratize company crucial KPI as our dog food for business insight.

Enable unit test on Spark locally, integrated with CI pipeline.

Migrated CDK to version 2, improved CI workflow.

Evangelizing Big Data technics with talks and drive technical improvements discussion every week.

Software Engineer, SmartNews, Jul. 2019 - Nov. 2020 (~1.5y)

Data platform team: build data infrastructure/platform to serve Ads/Data Science needs.

Improve Hive performance to 4x by utilizing partition after 1 month.

Tech lead: Lead team documentation culture, manage the process, and attend as representative in cross-team meeting after 2 months, Mentor 3 new members to equip the new data team with the full speed.

Streaming locational service data for local coupon service for millions of users.

Senior Engineer II, Appier, Jun. 2014 - Jul. 2019 (5y)

Senior Engineer II, ~ Jul. 2019

Data governance chair / Agile coach: Coordinating cross-team product features, burning down company level technical debts and solved them gradually with shipping new feature simultaneously.

AI backend: Building extensive machine learning / analytical services through various approaches with F2E, Data Scientists, Data Infrastructure team.

Mentor new members by not only tech. documentation but also Ads domain knowledge.

Software Engineer, Jun. 2014 - 2018

Build RTB bidding algorithm in a fast-growing, dynamic business environment.

Conduct experiments on real product to make daily improvements and achieve business goals.

Solve critical issues and conduct root cause analysis in uncontrolled, unreproducible and unbalanced data environment with strong time constraints.

Enable product features with petabytes level data by Spark, AWS RDS and Airflow.

Start

Education

National Taiwan University, Taiwan, Sep 2009 - Jun 2011

M.S., Department of Computer Science and Information Engineering

National Chiao Tung University, Taiwan, Sep 2005 - Jun 2009

B.S., Department of Computer Science

Skills

Data Lake/Warehousing: Python, Spark, Scala, Airflow, Presto | Streaming: Kafka, Flink

Machine Learning: Scikit learn, Tensorflow (side project), TensorFlow Serving

AWS Cloud Services: EC2/ECS/ECR/Lambda, EMR (Hive), RDS, DynamoDB, ElastiCache (Redis), Athena, SNS, SQS, Glue, SageMaker, CloudFormation/CDK.

API Services: FastAPI, Flask, Django, Swagger/Flasgger | CI/CD: Jenkins/CircleCI/TravisCI, Ansible

Human Languages: English: Business Level | Japanese: Conversational | Mandarin: Native

Selected Projects

Ad Product Technical Debt Burn Down '18Q4

Co-work with scientists to migrate an legacy machine learning project from python2 to python3
Design, burn down and implement new log patch framework to secure safe patch behavior, also enable abstraction on log patch mechanism.

Data Governance '18Q2 ~

Form a Data Governance committee with tech leads to consistently improve data quality and availability.

The missions are: schema evolving, legacy deprecation and data platform migration planning.

Automatical Refund System '17Q2 ~ '18Q2

Saved more than 10 millions TWD dollars for our business as well as tremendous human effort, milestones include:

Automate process to save support team and CM team's human effort (17'Q2)
Eliminate major data discrepancy (17'Q3)
Support various timezones, formats and make debug efficient. (17'Q4)
Different dimension breakdown and co-work with F2E to build a new UI (18'Q1)

Pipeline Reconstruction and Migration '17Q2 ~ '17Q3

Reconstruct ad-hoc pipeline and improve it by applying unit test, migrating DB, code-refactoring, and migrating to Jenkins.
Co-work with team members to migrate critical production pipelines to Airflow, till 2019Q1, more than 20 data pipelines are operated by Airflow.

Improve ML Model Performance '16Q2

Improve high quality inventory discovery by embedding inventory as vector: precision achieve 79% from 5.3%, volume increased to 12.8x
Extend CPA model to different ads vertical: CPA reduced to 68%, volume increased to 2.4x

Learning & Sharing

Side Projects: I build side project in my leisure time, one of them is a chatbot which is cross platform on Line, Telegram, Discord and Twitch. I always introduce new tech. like ML, CI/CD in this project as my daily life.

I also like writing, I own a blog and here is one of my post: My Machine Learning Engineering in Appier

Publications

Me-link: Link me to the media - fusing audio and visual cues for robust and efficient mobile media interaction

We present a scalable mobile video recognition system, named “Me-link,” based on progressive fusion of light-weight audio visual features. -- 23rd International World Wide Web Conference (WWW 2014)