Cloud , including VPC , Cloud NAT , GKE , PubSub , GCS and LoadBalancer . Also responsible for maintaining high-traffic components in GKE, ex: RocketMQ (15k QPS), ELK (80k QPS) and Kafka (80k QPS). Provide backend engineers to monitor the alarm system through Prometheus & Thanos , and connect Alertmanager & Pagerduty in series for incident response. Cost Optimization Designing ELK with a high-availability & fault-tolerant architecture and replacing the GKE machine with a Spot instance . Using Promtail to real-time analyze the log data in Kafka, discover the application of redundant logs, and save the overall ELK log collection system cost
Full-time / Quan tâm đến làm việc từ xa
National Taipei University of Technology・
資訊工程學系