Over the past 4 years, I have been working as a DevOps engineer. In each company that I have worked for, I helped to migrate the infrastructure to K8s-based (EKS and Kops), while integrating with different tools for CI/CD according to the team's requirements. In hTC I use Azure DevOps pipeline, but in Linc Global we mainly use Jenkins as our build/test tool before deploying to the prod environment. I know how to maintain/update the K8s cluster (e.g. EKS) without downtime, and to align with business needs. At Amazon, I have sharpened my knowledge of Linux administration, including how networking works in Docker/K8s, and how to cooperate with each Amazon Service to achieve the need at minimal cost. I have also built solid knowledge to troubleshoot EKS/ECS relevant issues.
Taiwan
• Helped to reproduce and resolve customer issues for AWS services including VPC, EC2, EKS, ECS, aws-cdk, etc.
• Helped to identify issues/bugs in Amazon managed open source projects, e.g. aws-cdk and provide PR when applicable.
• Provided feedback to internal AWS teams on how to improve services.
• Wrote tutorial (technical article) for internal knowledge sharing, e.g. how-to provision Kafka cluster with SSL enabled on Docker
Oct 2021 - Now
• Upgraded k8s cluster to latest version, while also migrated from Kops to AWS EKS, using Terraform, Helm.
• Built a Prometheus based monitoring stack for k8s clusters using Helm chart kube-prometheus-stack, with a set of official/ community built dashboards.
• Built a test management tool, which allows QA team to create test project and document test cases, using TestLink, MySQL.
• Built CI/CD for multiple projects using Github Actions, Jenkins, Serverless, and Pytest.
• Established/enhanced the Python local developing environment to reduce the gap from it to dev integration environment.
• Identified and resolved multiple low hanging fruits to reduce costs for AWS services. For example, converted EBS storage type from gp2 to gp3 saved $280+/month.
• Analyzed server access logs for Linc platform using AWS Athena for a security issue.
• Ran load tests using k6, to dig into a memory leak issue on Node.js running inside K8s/docker.
Jan 2021 - Oct 2021
• Organized monolith Terraform configurations with modules and shell scripts. Modules are defined as networking, eks, db, security, etc.
• Better k8s resources management by Namespace, Label, Affinity, NodeSelector, Taints.
• Developed Helm charts for backend applications, including Deployment, Service, Ingress, Job, and/or RBAC.
• Deployed backend resources on AWS services, such as EKS, API Gateway, Fargate, Lambda, etc.
• Developed CI with CircleCI and Github Actions, from test to build.
• Developed CD with ArgoCD, with concept of GitOps and git branch workflow.
• Reduced backend pod startup time from ~6m to ~2m, by adjusting health check probes.
• Deployed secret storage by Hashicorp Vault, to put/get the secrets from inside/outside any k8s cluster.
• Ran load tests using k6, to resolve issue of insufficient workloads.
Jul 2020 - Jan 2021
• Migrated all Viveport services to Kubernetes based infrastructure, which has served 300+ micro-services and API online, using Kops, Terraform, and Istio.
• Built deployment pipeline to automate the integration of software releases (CI/CD) to environments including develop, qa, staging, and production, using Azure DevOps.
• One of the speakers of Kubernetes Summit 2019 in Taipei, which is a public event aiming to share knowledges about Kubernetes held by iThome.
• Shared base knowledge of Kubernetes for Viveport engineering teams, and how we implemented it with CI/CD pipeline.
• Developed several monitoring mechanism to maintain the availability and reliability of all services, using ElastAlert, Prometheus, Grafana.
• Developed/maintained logging system including traffic access logs, database requests, errors on server/client side, using ELK Stack.
• Supervised any modification to NoSQL database while maintaining the integrity and performance of all db clusters, using mongodb.
• Identified changes that are necessary to maintain optimal system performance while liaising with management to efficiently target and deploy solutions.
Dec 2016 - Jul 2020
• Worked with consultant to host a workshop for Viveport engineering teams to help people learn the basics to run Agile development using JIRA.
• Responsible for maintaining the SLA for all Viveport services, using tools including ElastAlert, Nagios, and New Relic.
• Developed a scheduling system for NOC team so each could easily maintain the shift timetable, using Python with Django, and MySQL.
• Triaged and troubleshooted if any issue occurred while coordinating with third-party vendors, customer contacts, or other IT teams to mitigate the problem.
• Tracked and documented issues' root causes and resolutions in detail.
• Administrated the ticketing system across the organization using JIRA/Confluence.
Dec 2015 - Dec 2016
Sep 2008 - Jun 2012
• Language: Python, Bash.
• Kubernetes: EKS, Kops, Minikube, Helm, Istio.
• CI/CD: Azure DevOps Services, GitHub Actions, GitLab CI, CircleCI, ArgoCD, Jenkins.
• Infrastructure: AWS, GCP, Terraform, docker-compose.
• Monitoring: Prometheus, Grafana, New Relic, Elastalert, Nagios.
• Database: MongoDB, MySQL.
• Logging: ELK Stack, Loki.
• Load Test: k6.
• Others: Akamai, Packer, Ansible, Hashicorp Vault, Serverless.
Over the past 4 years, I have been working as a DevOps engineer. In each company that I have worked for, I helped to migrate the infrastructure to K8s-based (EKS and Kops), while integrating with different tools for CI/CD according to the team's requirements. In hTC I use Azure DevOps pipeline, but in Linc Global we mainly use Jenkins as our build/test tool before deploying to the prod environment. I know how to maintain/update the K8s cluster (e.g. EKS) without downtime, and to align with business needs. At Amazon, I have sharpened my knowledge of Linux administration, including how networking works in Docker/K8s, and how to cooperate with each Amazon Service to achieve the need at minimal cost. I have also built solid knowledge to troubleshoot EKS/ECS relevant issues.
Taiwan
• Helped to reproduce and resolve customer issues for AWS services including VPC, EC2, EKS, ECS, aws-cdk, etc.
• Helped to identify issues/bugs in Amazon managed open source projects, e.g. aws-cdk and provide PR when applicable.
• Provided feedback to internal AWS teams on how to improve services.
• Wrote tutorial (technical article) for internal knowledge sharing, e.g. how-to provision Kafka cluster with SSL enabled on Docker
Oct 2021 - Now
• Upgraded k8s cluster to latest version, while also migrated from Kops to AWS EKS, using Terraform, Helm.
• Built a Prometheus based monitoring stack for k8s clusters using Helm chart kube-prometheus-stack, with a set of official/ community built dashboards.
• Built a test management tool, which allows QA team to create test project and document test cases, using TestLink, MySQL.
• Built CI/CD for multiple projects using Github Actions, Jenkins, Serverless, and Pytest.
• Established/enhanced the Python local developing environment to reduce the gap from it to dev integration environment.
• Identified and resolved multiple low hanging fruits to reduce costs for AWS services. For example, converted EBS storage type from gp2 to gp3 saved $280+/month.
• Analyzed server access logs for Linc platform using AWS Athena for a security issue.
• Ran load tests using k6, to dig into a memory leak issue on Node.js running inside K8s/docker.
Jan 2021 - Oct 2021
• Organized monolith Terraform configurations with modules and shell scripts. Modules are defined as networking, eks, db, security, etc.
• Better k8s resources management by Namespace, Label, Affinity, NodeSelector, Taints.
• Developed Helm charts for backend applications, including Deployment, Service, Ingress, Job, and/or RBAC.
• Deployed backend resources on AWS services, such as EKS, API Gateway, Fargate, Lambda, etc.
• Developed CI with CircleCI and Github Actions, from test to build.
• Developed CD with ArgoCD, with concept of GitOps and git branch workflow.
• Reduced backend pod startup time from ~6m to ~2m, by adjusting health check probes.
• Deployed secret storage by Hashicorp Vault, to put/get the secrets from inside/outside any k8s cluster.
• Ran load tests using k6, to resolve issue of insufficient workloads.
Jul 2020 - Jan 2021
• Migrated all Viveport services to Kubernetes based infrastructure, which has served 300+ micro-services and API online, using Kops, Terraform, and Istio.
• Built deployment pipeline to automate the integration of software releases (CI/CD) to environments including develop, qa, staging, and production, using Azure DevOps.
• One of the speakers of Kubernetes Summit 2019 in Taipei, which is a public event aiming to share knowledges about Kubernetes held by iThome.
• Shared base knowledge of Kubernetes for Viveport engineering teams, and how we implemented it with CI/CD pipeline.
• Developed several monitoring mechanism to maintain the availability and reliability of all services, using ElastAlert, Prometheus, Grafana.
• Developed/maintained logging system including traffic access logs, database requests, errors on server/client side, using ELK Stack.
• Supervised any modification to NoSQL database while maintaining the integrity and performance of all db clusters, using mongodb.
• Identified changes that are necessary to maintain optimal system performance while liaising with management to efficiently target and deploy solutions.
Dec 2016 - Jul 2020
• Worked with consultant to host a workshop for Viveport engineering teams to help people learn the basics to run Agile development using JIRA.
• Responsible for maintaining the SLA for all Viveport services, using tools including ElastAlert, Nagios, and New Relic.
• Developed a scheduling system for NOC team so each could easily maintain the shift timetable, using Python with Django, and MySQL.
• Triaged and troubleshooted if any issue occurred while coordinating with third-party vendors, customer contacts, or other IT teams to mitigate the problem.
• Tracked and documented issues' root causes and resolutions in detail.
• Administrated the ticketing system across the organization using JIRA/Confluence.
Dec 2015 - Dec 2016
Sep 2008 - Jun 2012
• Language: Python, Bash.
• Kubernetes: EKS, Kops, Minikube, Helm, Istio.
• CI/CD: Azure DevOps Services, GitHub Actions, GitLab CI, CircleCI, ArgoCD, Jenkins.
• Infrastructure: AWS, GCP, Terraform, docker-compose.
• Monitoring: Prometheus, Grafana, New Relic, Elastalert, Nagios.
• Database: MongoDB, MySQL.
• Logging: ELK Stack, Loki.
• Load Test: k6.
• Others: Akamai, Packer, Ansible, Hashicorp Vault, Serverless.