Taiwan Province, Taiwan
Company nature: Provide a solution to provide customers with cloud video streaming through our solution
game. There is no pure game development involved, but it is just to help customers with one more platform to realize the channel for playing games in the cloud. You can play AAA masterpieces without purchasing a very powerful machine.
Because the service is 7*24, the nature of the work requires rotation
1. The department is responsible for basic operations. Customers need to maintain the status of all projects of customers through our platform and our platform.
2. Responsible for monitoring windows/linux platform monitoring projects (PING/CPU/MEM/DISK).
Centos 6 / Centos 7
3. The monitoring project regularly tests whether the game can be played regularly.
4. Cooperate with customers to regularly update the platform and game patch in-game.
5. Monitoring room (Japan/Korea/Cloud(AWS+GCP+TencentCloud)).
6. Use monitoring technology (grafana+Nagios).
7. It is necessary to judge whether the machine equipment is the cause of the failure and inform the colleagues in the computer room to perform (scanning whether the hardware is damaged or reloading the OS) and if it is Windows, the blueband provides error_code for the colleagues in the computer room to find out the cause of the hardware damage.
When encountering major obstacles in the physical computer room, it is necessary to determine whether it is a network problem or the service cannot be started normally due to the loss of a component service in the architecture. It is necessary to clearly describe the situation and provide it to the back-end personnel to inquire and solve the problem.
The virtualization technology used is: vmware esxi. According to the specification, the corresponding VM and the pub/pri IP are set and delivered to RD. After the deployment is completed, we will take over the management and use the Nagios software for monitoring.
Working here has taught me the importance of componentization. When the maintenance system encounters any major abnormality, first determine which component caused the abnormality to restart the service.
Thanks for this job, I learned how to ask questions and how to describe the debugs that give people directions to do when I encounter any problems.
Thanks for this job, I learned the concept of the computer room network. I encountered major hazards during this job. I cooperated with the supervisor to check whether the network obstacles of the computer room cabinets are normal one by one, and learned how to remove the mistakes.
Encounter related concepts such as yml/grafana/Jenkins/Gitlabs and CI/CD.
January 2019 - Present