Site Reliability Engineer I (SRE-I) (1+)
| 99 days ago | NM

hat You’ll Take On

  • Windows Administration

    • Manage and maintain Windows servers, ensuring their stability, security, and performance.

  • CheckMK

    • Utilize CheckMK for comprehensive monitoring and alerting, ensuring all systems are functioning optimally.

  • Linux Administration

    • Diagnose and resolve issues on Linux systems, ensuring minimal downtime and maximum efficiency.

  • VMWare

    • Manage virtual environments using VMWare, ensuring resources are optimized and available.

  • vSan Understanding

    • Demonstrate a solid understanding of vSan for effective storage management and troubleshooting.

  • Cloud Administration

    • Administer and manage cloud services across AWS, Azure, Splunk, and GCP, ensuring seamless integration and operation.

  • Risk Assessment

    • Assess potential risks and impacts on game services and revenue, taking proactive measures to mitigate them.

  • Issue Identification

    • Identify issues, alerts, and critical service incidents using provided dashboards and monitoring tools.

  • Service Troubleshooting

    • Utilize studio playbooks to troubleshoot and diagnose basic issues across various services.

  • Communication

    • Relay accurate and timely information regarding service impacts to game studios, ensuring effective communication during incidents.

  • Incident Management

    • Spearhead outage management, including communication, triage, and escalation.

  • Daily On Call

    • Responsible for triaging and troubleshooting critical alerts form critical systems

What You Bring 

  • Experience:

    • Live Services Knowledge: Understanding of live services and their operational requirements.

    • Change/Crisis Management: Experience in managing change and crisis situations, ensuring minimal disruption to services.

    • Effective Communicator: Able to relay information accurately and timely to the game studio and other stakeholders.

    • Team Player: Works well in a collaborative environment, sharing knowledge and supporting team members.

  • Proactive Problem-Solving:

    • A commitment to continuous improvement and proactive issue resolution.

    • Proven experience in troubleshooting production problems affecting live services.

    • Able to identify potential issues before they become critical and manage details effectively.

  • Background:

    • At least 1 year of experience in a similar role and/or 3 years experience in a relevant role. 

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.