Staff Site Reliability Engineer (10+)

gehealthcare | 258 days ago | Bengaluru

Roles and Responsibilities

In this role, you will:
• Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria
• Develop automated solutions to address potential problems before they result in a service interruption
• Provide impact assessment and mitigation plan for changes going into the production environment
• Investigate root cause of severe and systemic outages, identify corrective actions and apply across the enterprise
• Develop availability measures that align with consumer experience to accurately assess the usability of crucial services
• Build capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outages
• Identify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outages
• Analyze failure points in services to model risk level and resolution steps if failure occurs.
• Assist in driving architecture enhancements into system to mitigate potential failure points.
• Programmatically monitor for and remediate configuration drift of critical devices
• Develop response plans to potential failure points and evaluate effectiveness during planned tests
• Perform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architecture
• As a tech lead for the SRE group Provide technical coaching and direction to more junior teammates

Required Qualifications:

Bachelor's Degree in Computer Science or STEM” Majors (Science, Technology, Engineering and Math) with at least years of experience 10-12 years

Desired Qualifications:

• Excellent knowledge of common operating systems (Unix/Linux, Windows)

• Excellent knowledge of TCP/IP networking, and inter-networking technologies (routing/switching, proxy, firewall, load balancing etc.)• Demonstrated experience scripting or developing software and services for the cloud Ruby, Python, Go, Java, Node.js, .NET, etc.

• Extensive Experience with Infrastructure Automation

• Experience using an automated configuration management system (Terraform, Chef, Puppet, Ansible, Salt, etc.)
• Experience deploying and managing infrastructure on public clouds such as AWS or Azure
• Experience with configuring, customizing, and extending monitoring tools (Datadog, Sensu, Grafana, Splunk, etc.)

We expect all employees to live and breathe our behaviours: to act with humility and build trust; lead with transparency; deliver with focus, and drive ownership – always with unyielding integrity.

Our total rewards are designed to unlock your ambition by giving you the boost and flexibility you need to turn your ideas into world-changing realities. Our salary and benefits are everything you’d expect from an organization with global strength and scale, and you’ll be surrounded by career opportunities in a culture that fosters care, collaboration and support.

Official notification

Join our Telegram group for daily job update

⚡ Hot Jobs Trending Now

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SDE

Staff Software Eng.

Airbnb | Gurgaon, India

Prod

Platform Engineer

Databricks | Bangalore

Quality Assurance

GitLab | Remote

Security

Cloud Security

Zscaler | Mumbai

Product Designer

Figma | Pune, India

SDE