Developed AWS infrastructure using EKS, EC2, Load-Balancer, S3, and other services to support high-traffic applications and services.
Configured and maintained monitoring services to provide insights into system performance and health, enabling proactive issue resolution and continuous improvement.
Implemented infrastructure as code using CloudFormation, streamlining deployment processes and enabling greater scalability and reliability.
Established and automated CI/CD processes using Screwdriver, resulting in higher quality releases.
Automated daily tasks to improve team productivity and efficiency.
Built a Splunk infrastructure to support high-traffic applications and services, Providing zero downtime and log analyze.
Developed and maintained automated tracking tools for internal users using Python.
successfully managed instances from various regions across the globe.
Site Reliability Engineer, Shopee Aug 2021 ~ Jun 2022
Deployed Google Cloud Platform infrastructure, including GKE, Load-Balancer, SQL, NAT, to support Shopee services and enable greater scalability and reliability.
Configured and maintained Google Cloud Monitoring to provide real-time insights into system performance and health, enabling proactive issue resolution and continuous improvement.
Implemented Terraform to streamline infrastructure as code and enable greater efficiency and consistency in deployment processes.
Analyzed logs to identify opportunities to reduce storage costs, resulting in a 20% reduction in overall storage expenses.
Built a CI/CD process using Kustomize, higher quality releases, and componentized the process to enable greater modularity and scalability.
Automated various tasks, ex. automated create database users, to improve team productivity and efficiency and reduce time spent on repetitive or manual work.
Successfully handled high traffic volumes and implemented strategies to optimize system performance and ensure uptime.
Troubleshot database deadlocks using SQL Inspect to identify and resolve issues and improve overall system performance and stability.
Conducted stress testing to identify potential bottlenecks or weaknesses in the system and enable proactive issue resolution before deployment.