Svpzebntubzk3gt9d2f6

Connor Hsu

Curious about data and real world, building product to solve problem, making machine learning into product, writing is my interest.

Summary

  • 9 years experience of large scale AI product building, and is capable of building product from scratch.
  • Extensive problem solving experience for data science/engineering, and familiar with transferring real problem into requirements and solution planning.
  • A well-rounded engineer in data project who bridges the gap between scientists and engineers.
  • A pragmatic and ownership driven person, experienced with gap analysis, migration plan and release management.
  • Leader of documentation and process, mentor of junior engineers.
  • Product-driven mindset, learning new tech. and use them to build side projects continuously (chatbot, blockchain, ..., etc.)

Expertise fields:

  • ML Ops / Data Engineering: This is my strength.
  • Data Science/Modeling: Not using the cutting edge technologies in the work, but it can be swiftly brought up to speed. (it's my master degree).

Language Capability:

  • English: Business Level | Japanese: Conversational | Mandarin: Native

Experience

Data Infra Engineer, Moneytree, Nov. 2020 - present (~2.5y)

  • Built a Data Lakehouse solution on a full cloud environment (AWS).
  • Involved with 3 new data products and they all able to drive initial revenue stream for the company, with a team has only 2 engineers.
  • Lead a cross team initiative to democratize company crucial KPI.
  • Enable unit test on Spark locally, integrated with CI pipeline.
  • Migrated CDK to version 2, improved CI workflow.
  • Evangelizing Big Data technics with talks and drive technical improvements discussion every week.
  • Software Engineer, SmartNews, Jul. 2019 - Nov. 2020 (~1.5y)

  • Data platform team: build data infrastructure/platform to serve Ads/Data Science needs.
  • Improve Hive performance to 4x by utilizing partition after 1 month.
  • Tech lead: Lead team documentation culture, manage the process, and attend as representative in cross-team meeting after 2 months, Mentor 3 new members to equip the new data team with the full speed.
  • Streaming locational service data for local coupon service for millions of users.
  • Senior Engineer II, Appier, Jun. 2014 - Jul. 2019 (5y)

    Senior Engineer II, ~ Jul. 2019

  • Data governance chair / Agile coach: Coordinating cross-team product features, burning down company level technical debts and solved them gradually with shipping new feature simultaneously.
  • AI backend: Building extensive machine learning / analytical services through various approaches with F2E, Data Scientists, Data Infrastructure team.
  • Mentor new members by not only tech. documentation but also Ads domain knowledge.
  • Software Engineer, Jun. 2014 - 2018

  • Build RTB bidding algorithm in a fast-growing, dynamic business environment.
  • Conduct experiments on real product to make daily improvements and achieve business goals.
  • Solve critical issues and conduct root cause analysis in uncontrolled, unreproducible and unbalanced data environment with strong time constraints.
  • Enable product features with petabytes level data by Spark, AWS RDS and Airflow.
  • Selected Skills


    Data Lake/Warehousing: Python, Spark, Scala, Airflow, Presto | Streaming: Kafka, Flink

    Machine Learning: Scikit learn, Tensorflow (side project), TensorFlow Serving

    AWS Cloud Services: EC2/ECS/ECR/Lambda, EMR (Hive), RDS, DynamoDB, ElastiCache (Redis), Athena, SNS, SQS, Glue, SageMaker, CloudFormation/CDK.

    API Services: FastAPI, Flask, Django, Swagger/Flasgger | CI/CD:  Jenkins/CircleCI/TravisCI, Ansible

    Selected Projects


    Ad Product Technical Debt Burn Down '18Q4

    • Co-work with scientists to migrate an legacy machine learning project from python2 to python3
    • Design, burn down and implement new log patch framework to secure safe patch behavior, also enable abstraction on log patch mechanism.

    Data Governance '18Q2 ~

    Form a Data Governance committee with tech leads to consistently improve data quality and availability.
    The missions are: schema evolving, legacy deprecation and data platform migration planning.

    Automatical Refund System '17Q2 ~ '18Q2

    Saved more than 10 millions TWD dollars for our business as well as tremendous human effort, milestones include:

    • Automate process to save support team and CM team's human effort (17'Q2)
    • Eliminate major data discrepancy (17'Q3)
    • Support various timezones, formats and make debug efficient. (17'Q4)
    • Different dimension breakdown and co-work with F2E to build a new UI (18'Q1)

    Pipeline Reconstruction and Migration '17Q2 ~ '17Q3

    • Reconstruct ad-hoc pipeline and improve it by applying unit test, migrating DB, code-refactoring, and migrating to Jenkins.
    • Co-work with team members to migrate critical production pipelines to Airflow, till 2019Q1, more than 20 data pipelines are operated by Airflow.

    Improve ML Model Performance '16Q2

    • Improve high quality inventory discovery by embedding inventory as vector: precision achieve 79% from 5.3%, volume increased to 12.8x
    • Extend CPA model to different ads vertical: CPA reduced to 68%, volume increased to 2.4x

    Learning & Sharing


    • Side Projects: I build side project in my leisure time, one of them is a chatbot which is cross platform on Line, Telegram, Discord and Twitch. I always introduce new tech. like ML, CI/CD in this project as my daily life.

    Education

    National Taiwan University, Taiwan, Sep 2009 - Jun 2011

    M.S., Department of Computer Science and Information Engineering

    National Chiao Tung University, Taiwan, Sep 2005 - Jun 2009

    B.S., Department of Computer Science

    Publications


    Me-link: Link me to the media - fusing audio and visual cues for robust and efficient mobile media interaction