Sr Site Reliability Engineer (2+)

geaerospace | 125 days ago | Bengaluru

Essential Responsibilities:

Understand business requirements and collaborate with Product & DevOps teams to implement highly available, scalable, resilient, cost-efficient solutions in Cloud environments.
Deploy Observability tools (New Relic, Splunk, ELK, Other open source O11y tools..etc) in our Cloud infrastructure and applications via Terraform and be the SME for these tools.
Create and configure alerts, dashboards, reports mapping to the Golden signals – Latency, Errors, Traffic, Saturation.
Pioneer the definitions of SLIs, SLOs and Error Budgets for GE Aerospace Digital Workplace’s products and services. And, champion the implementation for large scale adoption.
Perform Root Cause Analysis (RCA) for SLO breaches, Alerts and Incidents. Front-end the troubleshooting and debugging sessions.
Solve problems relating to critical products, applications, services and create solutions (automations, runbooks..etc.) to prevent problem recurrence.
Lead the Incident Management + Postmortem processes and collaborate with the Operations team to develop the templates for comms, runbooks and documents.
Consistently share best practices for reliability, resiliency, performance, and improve processes within and across teams.
Execute data driven approach to make decisions around capacity needs, Cloud cost optimization and infrastructure stability.
Prioritize reducing MTTx (Mean Time to Recover/Resolve/Repair) for Production incidents to provide better user experience.
Propose new design and develop solutions to solve complex problems in application resiliency and availability.
Be a strong technical mentor for junior team members professionally to help them realize their full potential.

Qualifications/ Requirements:

Bachelor’s degree from a recognized university or college with a minimum of 4 years of professional experience OR Diploma with a minimum of 5 years of professional experience OR Higher Secondary Certificate with a minimum of 7 years of professional experience
A minimum of 2 years of experience in Production Engineering or Site Reliability Engineering roles.
A minimum of 2 years of experience in Cloud environments (e.g., AWS, Azure) is required.
A minimum of 2 years of experience in DevOps and Infrastructure domain.

Official notification