Ke1bprrjuzdfkgzhomhk

陳昭儒(Chao-Ju Chen)


Github
[email protected]


Education

National Taiwan University

Bachelor’s Degree, Electrical Engineering                                                                      2012 ~ 2017

Project Highlights

Aggregating Files in one ETL, output 60B row to Data Warehouse

  1. Input: 40726 gzipped files(200GB in total)

  2. Task: Loading columns with values parsed from each gzipped file name. Wrote to BigQuery existing table(specific schema) in parallel.

  3.  Tool: GCP Dataflow(Hosted Serverless Apache Beam)
  4. Result:

  • The job took 40min to finish.

  • Machine Type: n1-standard-1(1 vcpu, 3.75GB memory)

  • Autoscaled up to 122 workers at peak.

  • The data inserted into BigQuery was:

  • Table size: 344.95 GB

  • Number of rows: 6,268,519,176     

 

Qudowe Project Lead & Software Engineer

Product of Pixnet Travel Hackathon 2019, a trip planner based on Instagram's data                                               

Work Experience

Vpon, Data Engineer                                                                                      Aug 2018 ~ Oct 2020

  • Implement Akka-http(Scala) server endpoints for Vpon Data Platform Product
  • Create new ETL pipelines using GCP Spark(Apache Spark) and GCP Dataflow(Apache Beam) to batch input/output hundreds of files
  • Migrate existing ETL pipelines from AWS(Hive SQL) to GCP(BigQuery SQL) using python

  • Migrate datawarehouse from AWS(Hive) to GCP(BigQuery) 

  • Setup Prometheus on GKE(Google Kubernetes Engine) to monitor resource usage(CPU, memory of compute engine)

Largitdata, Data Quality Manager                                                              Sep 2017 ~ July 2018

Monitor status of all web scraping running scripts.(Flask)
Write and maintain web scraping scripts on distributed system.(Python + Celery + RabbitMQ/Redis)

Largitdata, Web Scraping Intern                                                                 Jan 2017 ~ Aug 2017

Write many web scraping scripts for various sorts of websites.

Skills

Languages - Python, Scala

Big Data Framework - Apache Spark, Hadoop/HDFS, GCP BigQuery, GCP Dataflow

Cloud Platform - Google Cloud Platform

Version Control - Git

Interest

Basketball

3 yrs on NTUEE girls' basketball team.

Captain of the NTUEE girls' basketball team for one year.

Psychology

Took many courses in psychology department and cognitive neuroscience.

Language

Interest in learning new languages.(Learned little French and German.)


Language Skills


Proficiency in English (TOEFL iBT : 105)