I like to research new technologies and be able to learn to apply them to projects at work, and have the ability to solve problems and contribute source code when using open source software to develop or build services. I have contributed to projects such as Gin, Tempo, Knative Serving, OpenTelemetry Operator, Gatus and have been merged back into the main branch.
09.2023 ~ Present
Taipei, Taiwan
Application Infrastructure & Observability
Services required to maintain exchange backend services on Google Cloud, including VPC, Cloud NAT, GKE, PubSub, GCS and LoadBalancer. Also responsible for maintaining high-traffic components in GKE, ex: RocketMQ (15k QPS), ELK (80k QPS) and Kafka (80k QPS).
Provide backend engineers to monitor the alarm system through Prometheus & Thanos, and connect Alertmanager & Pagerduty in series for incident response.
Cost Optimization
Designing ELK with a high-availability & fault-tolerant architecture and replacing the GKE machine with a Spot instance. Using Promtail to real-time analyze the log data in Kafka, discover the application of redundant logs, and save the overall ELK log collection system cost up to 50%.
Daily Operation
Manage cloud resources on GCP through Terrafrom and implement best practices, such as managing various database passwords and keys through GCP Secret Manager with auto rotation.
Use ArgoCD to deploy the backend application Helm/Kustomize repository on GitLab, and provide relevant technical consultation to backend engineers.
08.2022 ~ 09.2023
Singapore, Singapore
On-duty and stability operation
Responsible for the monitoring and alarming, emergency response, capacity planning and new IDC deployment of the multi-region media processing platform.
Analyze on-duty data, use indicators such as SLI, SLO, MTTR, etc... to make decisions, improve alarm rules, shorten fault recovery time, and optimize system stability and on-duty experience.
Platform development
Develop an automatic analysis system for accidents to assist decision-making and handling measures during the emergency response process, such as load reduction, rate limiting or traffic switching.
Develop an automated server operation platform to handle scale up/down and abnormal machine recovery of large computing cluster.
05.2021 ~ 08.2022
Taipei, Taiwan
Serverless Platform
Using Knative Serving provides a fully managed multi-tenant deployment environment, based on a shared Kubernetes cluster for each project team, and with the configured ArgoCD as the only deployment entry and provides deployment notification integration, managed certificates etc...
Status Page
Develop a report system of the operating status of each project in Nuxt.js, Gin & MySQL.
Taiwan Observability Platform
Participate in the design and dispatch of a dashboard that makes it easy for each dev team to find the status of applications on Kubernetes.
Build and maintain a shared platform, such as Grafana, Grafana Loki, Promethus, Grafana Tempo Sentry and ArgoCD.
Use ArgoCD ApplicationSet to manage common components of most project Kubernetes clusters in Taiwan, such as Ingress Controllers Installation and maintenance.
Dependency construction related to other working platforms, such as Kafka, Redis, Postgres, ClickHouse.
LHCI Farm
Use Argo Events and Argo Workflows to provide a fully-managed frontend web page Lighthouse report platform in Kubernetes.
09.2019 ~ 05.2021
Taipei, Taiwan
LINE MUSIC
I participate with Korean colleagues to communicate and develop new APIs required for the APP side. And develop the GraphQL API for use in the web version of LINE MUSIC.
Design a new architecture to improve the old architecture system, such as refactoring a monolithic application into a GRPC-based micro-service architecture, using Kafka to build a data pipeline to collect data and provide it to the data team for analysis.
Self-Hosted ElasticeSearch, Filebeat and APM for log, application monitoring etc...
In addition, assist outsourcing to import container technology from VM architecture, from building CI/CD pipeline to using Kustomize, Helm, ArgoCD, etc... to organize a large number of manifest files.
LINE MUSIC LIVE
Design and develop a ticketing system. Handled the flash sale ticket purchase activity for the New Year's Eve concert that served 100k people.
06.2018 - 08.2019
Taipei, Taiwan
LINE Today MRT Bus Info
Use Vue.js to build a single-page application that allows users to query Taipei bus timetables, write unit tests, and deploy through PM2.
LINE NOW Official website
Develop a content management system for editors to edit official website content, use Koa.js, MongoDB to develop RESTful API, and use docker-compose to deploy the entire system and integrate ELK to collect logs.
TechPulse 2018 Event LINE Bot
Just a LINE Bot developing in Java Spring Boot, MongoDB and Redis.
CNY 2019 Event Main Page
The SPA event page is developed in Vue.js and TypeScript. Write unit-test by Jest.
I like to research new technologies and be able to learn to apply them to projects at work, and have the ability to solve problems and contribute source code when using open source software to develop or build services. I have contributed to projects such as Gin, Tempo, Knative Serving, OpenTelemetry Operator, Gatus and have been merged back into the main branch.
09.2023 ~ Present
Taipei, Taiwan
Application Infrastructure & Observability
Services required to maintain exchange backend services on Google Cloud, including VPC, Cloud NAT, GKE, PubSub, GCS and LoadBalancer. Also responsible for maintaining high-traffic components in GKE, ex: RocketMQ (15k QPS), ELK (80k QPS) and Kafka (80k QPS).
Provide backend engineers to monitor the alarm system through Prometheus & Thanos, and connect Alertmanager & Pagerduty in series for incident response.
Cost Optimization
Designing ELK with a high-availability & fault-tolerant architecture and replacing the GKE machine with a Spot instance. Using Promtail to real-time analyze the log data in Kafka, discover the application of redundant logs, and save the overall ELK log collection system cost up to 50%.
Daily Operation
Manage cloud resources on GCP through Terrafrom and implement best practices, such as managing various database passwords and keys through GCP Secret Manager with auto rotation.
Use ArgoCD to deploy the backend application Helm/Kustomize repository on GitLab, and provide relevant technical consultation to backend engineers.
08.2022 ~ 09.2023
Singapore, Singapore
On-duty and stability operation
Responsible for the monitoring and alarming, emergency response, capacity planning and new IDC deployment of the multi-region media processing platform.
Analyze on-duty data, use indicators such as SLI, SLO, MTTR, etc... to make decisions, improve alarm rules, shorten fault recovery time, and optimize system stability and on-duty experience.
Platform development
Develop an automatic analysis system for accidents to assist decision-making and handling measures during the emergency response process, such as load reduction, rate limiting or traffic switching.
Develop an automated server operation platform to handle scale up/down and abnormal machine recovery of large computing cluster.
05.2021 ~ 08.2022
Taipei, Taiwan
Serverless Platform
Using Knative Serving provides a fully managed multi-tenant deployment environment, based on a shared Kubernetes cluster for each project team, and with the configured ArgoCD as the only deployment entry and provides deployment notification integration, managed certificates etc...
Status Page
Develop a report system of the operating status of each project in Nuxt.js, Gin & MySQL.
Taiwan Observability Platform
Participate in the design and dispatch of a dashboard that makes it easy for each dev team to find the status of applications on Kubernetes.
Build and maintain a shared platform, such as Grafana, Grafana Loki, Promethus, Grafana Tempo Sentry and ArgoCD.
Use ArgoCD ApplicationSet to manage common components of most project Kubernetes clusters in Taiwan, such as Ingress Controllers Installation and maintenance.
Dependency construction related to other working platforms, such as Kafka, Redis, Postgres, ClickHouse.
LHCI Farm
Use Argo Events and Argo Workflows to provide a fully-managed frontend web page Lighthouse report platform in Kubernetes.
09.2019 ~ 05.2021
Taipei, Taiwan
LINE MUSIC
I participate with Korean colleagues to communicate and develop new APIs required for the APP side. And develop the GraphQL API for use in the web version of LINE MUSIC.
Design a new architecture to improve the old architecture system, such as refactoring a monolithic application into a GRPC-based micro-service architecture, using Kafka to build a data pipeline to collect data and provide it to the data team for analysis.
Self-Hosted ElasticeSearch, Filebeat and APM for log, application monitoring etc...
In addition, assist outsourcing to import container technology from VM architecture, from building CI/CD pipeline to using Kustomize, Helm, ArgoCD, etc... to organize a large number of manifest files.
LINE MUSIC LIVE
Design and develop a ticketing system. Handled the flash sale ticket purchase activity for the New Year's Eve concert that served 100k people.
06.2018 - 08.2019
Taipei, Taiwan
LINE Today MRT Bus Info
Use Vue.js to build a single-page application that allows users to query Taipei bus timetables, write unit tests, and deploy through PM2.
LINE NOW Official website
Develop a content management system for editors to edit official website content, use Koa.js, MongoDB to develop RESTful API, and use docker-compose to deploy the entire system and integrate ELK to collect logs.
TechPulse 2018 Event LINE Bot
Just a LINE Bot developing in Java Spring Boot, MongoDB and Redis.
CNY 2019 Event Main Page
The SPA event page is developed in Vue.js and TypeScript. Write unit-test by Jest.