Site Reliability Engineer (5+)
wsa | 131 days ago | Hyderabad

Key Responsibilities:   

  • Ensure System Uptime and Reliability: Monitor and maintain cloud-based applications and infrastructure, ensuring minimal downtime and efficient incident response. 
  • Build and Optimize Monitoring and Alerting Systems: Set up and continuously improve comprehensive monitoring and alerting frameworks to detect and address issues proactively. 
  • Cloud Infrastructure Management: Manage, optimize, and scale systems on Azure cloud platforms, ensuring high performance and cost-effectiveness. 
  • Incident Management and Response: Act as the first line of defense in identifying, diagnosing, and resolving technical issues in real-time or escalate them to the appropriate teams. 
  • Automation and Infrastructure as Code (IaC): Utilize IaC tools to automate infrastructure provisioning and management, promoting reproducibility and reducing manual interventions. 
  • Tooling and Observability: Leverage technologies such as Grafana for observability and Argo for CI/CD automation, enhancing our ability to respond swiftly and effectively to infrastructure needs. 
  • Collaboration: Work closely with cross-functional teams to align on SRE best practices, share insights, and support development and operational goals. 

Requirements: 

  • Experience with Cloud Platforms: 5+ years of experience in cloud environments, with a primary focus on Azure. 
  • Monitoring and Alerting Skills: Strong experience with monitoring tools (e.g., Grafana, Prometheus) and a background in setting up alerts and dashboards. 
  • Incident Management: Proven track record in diagnosing and troubleshooting complex system issues, with a focus on fast incident response and resolution. 
  • Collaboration and Communication: Excellent communication skills, with an ability to work collaboratively with various technical teams and stakeholders. 
  • Kubernetes Expertise: Proficiency with Kubernetes (K8s) for orchestrating and managing containerized applications. 
  • Automation and IaC: Hands-on experience with any Scripting language (e.g., Python, Shell script, Power shell) Infrastructure as Code (e.g., Terraform, Ansible) for automating cloud infrastructure management. 

Preferred Qualifications: 

  • Familiarity with CI/CD tools, particularly Argo for workflow automation. 
  • Certification in Azure, AWS, or Kubernetes. 
  • Experience working in an SRE or DevOps capacity in a multi-cloud environment. 
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.