Site Reliability Engineer of AI Infrastructure Operations (IMC)

Job updated 2 days ago

Job Description

Manage and lead the design, implementation, and maintenance of AI infrastructure systems for reliable operations of VNAP's AI prediction services and training environments

  1. Co-work with IT/CIM infra teams, which host CPU/GPU application servers and database services such as VM/K8S, Kafka, MongoDB, Oracle middleware for VNAP, to ensure high availability and reliability through well-established monitor metrics and alarms.
  2. Design and implement infra-as-code tools like Ansible and Terraform to establish auto-recovery mechanisms to minimize tool idle/hold lot impacts caused by system issues.
  3. Develop and maintain applications using C#/Delphi/Python on top of those infrastructure systems.
  4. Work location : Hsinchu or Taoyuan
  5. Hiring Organization: IMC

Requirements

  1. Master's degree in Computer Science, Information Technology, or related field.
  2. Minimum 3 years of experience in infrastructure and system administration/operations.
  3. Strong understanding and hands-on experience of message queuing systems and SQL/No-SQL databases, such as Kafka, MongDB, Oracle and MariaDB.
  4. Experience in operational system administration, such as Windows servers and Linux distributions.
  5. Strong experience in networking technologies including firewalls, nginx load balancing, and virtual IP setup.
  6. Experience in operation monitor systems such as Zabbix, Prometheus, and Graphana.
  7. Experience in infra-as-code tools like Ansible and Terraform.
  8. Experience in application development using C#/Delphi/Python on top of AI infrastructure system components for auto recovery.
  9. Excellent communication and interpersonal skills for cross division/department cooperation.
1
3 years of experience required
40,000+ TWD / month
Personal Invitation Link
This is your personal referral link for job invitation. You'll receive an email notification when someone applied for the position via your job link.
Share this job
Logo of TSMC 台積電.

About us

Established in 1987, TSMC is the world's first dedicated semiconductor foundry. As the founder and a leader of the Dedicated IC Foundry segment, TSMC has built its reputation by offering advanced and "More-than-Moore"​ wafer production processes and unparalleled manufacturing efficiency. From its inception, TSMC has consistently offered the foundry segment's leading technologies and TSMC COMPATIBLE® design services.

TSMC has consistently experienced strong growth by building solid partnerships with its customers, large and small. IC suppliers from around the world trust TSMC with their manufacturing needs, thanks to its unique integration of cutting-edge process technologies, pioneering design services, manufacturing productivity and product quality.

The company's total managed capacity reached above 9 million 12-inch equivalent wafers in 2015. TSMC operates three advanced 12-inch wafer fabs, four eight-inch wafer fabs, one six-inch wafer fab (fab 2) and two backend fabs (advanced backend fab 1 and 2). TSMC also manages two eight-inch fabs at wholly owned subsidiaries: WaferTech in the United States and TSMC China Company Limited. TSMC also obtains eight-inch wafer capacity from other companies in which the Company has an equity interest.

TSMC is listed on the Taiwan Stock Exchange (TWSE) under ticker number 2330, and its American Depositary Shares trade on the New York Stock Exchange (NYSE) under the symbol "TSM"​.

台積公司是全世界最大的專業積體電路製造服務公司。台積公司在民國七十六年成立於台灣新竹科學工業園區,並開創了專業積體電路製造服務商業模式。

領先的技術、卓越的製造,以及對於研發及產能投資的持續承諾,讓我們能夠在行動裝置、高效能運算、物聯網與車用半導體領域掌握商機。

台積公司的全球總部位於新竹科學園區,在北美、歐洲、日本、中國大陸、南韓、印度等地均設有子公司或辦事處,提供全球客戶即時的業務和技術服務。


Team

Avatar of the user.
Human Resource
Avatar of the user.
Human Resource
Avatar of the user.
Human Resource
Avatar of the user.
Human Resource

Jobs

Full-time
Mid-Senior level
1
40K+ TWD / month
Save

Full-time
Mid-Senior level
1
40K+ TWD / month
Save

Full-time
Mid-Senior level
1
40K+ TWD / month
Save