陳昭儒(Chao-Ju Chen) Github [email protected] Education National Taiwan University Bachelor’s Degree, Electrical Engineering 2012 ~ 2017 Project Highlights Aggregating Files in one ETL, output 60B row to Data Warehouse Input :gzipped files(200GB in total) Task : Loading columns with values parsed from each gzipped file name. Wrote to BigQuery existing table(specific schema) in parallel. Tool: GCP Dataflow(Hosted Serverless Apache Beam) Result : The job took 40min to finish. Machine Type: n1-standard-1(1 vcpu, 3.75GB memory) Autoscaled up to 122 workers at peak. The data