Neuron Solutions logo

Data Centre System Operation Engineer (Johor Bahru)

Neuron Solutions

We are seeking an L1 Data Center Operations Engineer to support 24x7 daily operations of large-scale GPU clusters and supporting data center infrastructure. This role provides first-line monitoring, incident triage, ticket handling, service coordination and physical break-fix support to ensure stable, reliable, and efficient operation of high-performance GPU platforms supporting AI workloads.

KEY RESPONSIBILITIES

  • Oversee daily operations of GPU clusters and data center systems in a 24x7 shift based environment, ensuring services remain stable, available, and operating within defined SLAs.
  • Monitor system health, performance, and capacity using monitoring and alerting tools, and proactively identify abnormal conditions or potential risks.
  • Acknowledge, triage, and respond to operational incidents, perform first-level troubleshooting based on documented SOPs, and escalate to L2/L3 teams when issues cannot be resolved at L1, ensuring timely service restoration.
  • Own ticket lifecycle in the ITSM system, including creation, categorization (Incident, Service Request, Change), prioritization, regular updates, and closure with proper evidence.
  • Coordinate with hardware vendors for GPU server break-fix, including opening cases, providing logs, tracking progress, and validating restoration.
  • Perform physical data center tasks such as cabling (fiber/copper), optics replacement, PDU visual inspection, labeling, basic rack checks, and environmental inspections, following approved work orders.
  • Support deployment and commissioning activities for racks and infrastructure under guidance from senior engineers, including racking, cabling, and basic validation.
  • Collect logs, screenshots, and diagnostic outputs from systems and monitoring tools to support troubleshooting and vendor cases.
  • Work closely with Network, Systems, Platform, Facilities, and Vendors to support AI workloads and coordinate incidents and maintenance activities.
  • Participate in shift handovers, clearly documenting open issues, risks, pending vendor actions, and planned activities.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Equivalent practical experience will be considered.
  • 2+ years of experience in IT infrastructure operations, data center operations, NOC, or similar environment.
  • Familiarity with GPU hardware platforms (e.g., NVIDIA GPUs) and basic awareness of AI / HPC environments.
  • Basic Linux operating system skills (command line usage, log review, service status, file systems).
  • Experience using monitoring and alerting tools (e.g., Prometheus, Grafana, Zabbix, or similar).
  • Experience working with ticketing and IT service management tools (e.g., Jira Service Management or similar).
  • Hands-on experience performing IT hardware replacement and basic break-fix tasks.
  • Experience working in data center operations, system operations, or technical support roles.

Job Type: Permanent

Pay: From RM5,000.00 per month

Application Question(s)

  • This role requires to work on 24/7 shift. Are you okay with it?

Experience

  • System Operation Engineer: 2 years (Required)
  • Data Centre IT operations: 2 years (Required)

Work Location: In person

Job Type

Job Type
Full Time
Location
Kulai, Johor

Share this job: