Site Reliability Engineering Manager (8+)

athenahealth | 200 days ago | Bengaluru

Job Responsibilities

Team Leadership & Development

· Lead and mentor a team of SREs, providing guidance, coaching, and support to foster growth and career development.

· Build and grow a high-performing team focused on operational excellence, reliability, and scalability.

· Establish and maintain a strong team culture of collaboration, accountability, and continuous improvement.

· Work with cross-functional teams (Engineering, Product and Project Management) to align priorities and build effective working relationships.

· Service Reliability & Performance

· Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical systems.

· Monitor and improve the reliability, availability, and performance of all production services and infrastructure.

· Own and drive efforts to improve incident management, root cause analysis, and postmortem documentation.

· Implement proactive monitoring, alerting, and incident response strategies.

· System Automation & Scalability

· Lead efforts to automate and streamline operational processes, reduce manual toil, and improve system reliability.

· Identify and implement best practices for system design, capacity planning, and cost optimization.

· Work closely with engineering teams to build scalable, resilient, and efficient systems that can handle increasing load.

· Collaboration & Cross-functional Engagement

· Collaborate with Engineering & Product teams to ensure reliability is baked into the development process, including reviewing code, design, and deployment practices.

· Advocate for reliability improvements across the engineering and product teams, ensuring a balance between speed and reliability.

· Work with other engineering managers to align on long-term goals, technical debt, and infrastructure investments.

· Process & Efficiency Improvement

· Drive continuous improvements in incident management, deployment pipelines, and system observability.

· Champion the adoption of tools and processes that improve automation, monitoring, alerting, and reporting.

· Measure and track key operational metrics, using data to inform decision-making and drive improvements.

Qualifications

· 8 years of experience building, scaling, and supporting highly available systems and services

· 3-4 years of years of experience managing and leading technical teams, including mentoring engineers and fostering team development.

· Strong experience with enterprise grade middleware, e.g. Web Servers, Apache & Load Balancers (NetScaler) hosted on a virtual machine cluster.

· Strong Expertise in configuration management tools like Puppet.

· Experience with Infrastructure-as-Code, Linux, VmWare and API integration. Familiarity with Terraform desired.

· Proficiency in at least one scripting or programming language (Ansible, Python, Go, Ruby, etc.).

· Expertise in the delivery, maintenance, and support of Linux systems and infrastructure

· Experience with cloud platforms ( AWS), containerization ( Docker), and orchestration ( Kubernetes).

· Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack, CloudWatch, Splunk)

· Experience implementing solutions using SRE, DevOps principles,

·· Familiarity with telemetry, latest monitoring, visualization tools.

· Expertise in promoting and driving system visibility to aid in the rapid detection and resolution of issues

· Bachelor's or master's degree in computer science, Engineering, or a related field.

· Experience in industries with high uptime requirements (e.g., financial services, healthcare, SaaS)

Official notification

🌟 Don't Just Apply—Help Others Too! 🌟

Simply refer someone to your organization and make a difference in their career journey. 🚀

Join our Telegram group for daily job update

Let's work together

Any question or remark? just write us a message

support@ninotronix.com

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.

Site Reliability Engineering Manager (8+)

Job description

🌟 Don't Just Apply—Help Others Too! 🌟

Let's work together

support@ninotronix.com

Send a message