Sr. Site Reliability Engineer (5+)
hashicorp | 238 days ago | Bengaluru

  • Implement best practices for system reliability, including proactive identification of potential failure points and the development of automated mitigations
  • Design and execute comprehensive load testing strategies to identify performance bottlenecks and scalability limits across our cloud products
  • Implement best practices and technologies to improve system resilience, ensuring high availability and fault tolerance.
  • Work closely with engineering and product teams to integrate operational readiness into the development lifecycle, enhancing product stability and user satisfaction.
  • Build and refine tools and frameworks for automated testing, environment simulation, and incident reproduction, reducing manual effort and increasing test coverage.
  • Conduct in-depth analysis of testing results, documenting findings and making actionable recommendations for system enhancements.
  • Drive Systemic Improvements to the products by introducing Chaos Testing and partnering with product development teams. 
  • Share your knowledge and expertise with team members, fostering a culture of learning and continuous improvement.
  • Develop and implement disaster recovery and backup strategies to ensure data integrity and system resilience.

 

Ideal Candidate

  • 5+ years of experience in SRE , systems engineering, or non functional testing roles with a focus on operational readiness, performance testing, or system scalability.
  • Experience in driving systemic improvements through Chaos engineering practices.
  • Programming skills in any of the high level languages or scripting 
  • Proven track record of leading successful load testing and performance optimization initiatives in cloud and on-prem environments.
  • Experience in creating and managing test environments for automated testing.
  • Strong fundamentals of CI/CD process and maintaining quality pipelines.
  • Experience with version control systems (e.g., Git) and agile project management methodologies
  • Understanding of monitoring and alerting systems, with the ability to develop metrics and alarms that accurately reflect system health and operational risks.
  • Strong technical foundation in cloud technologies ( AWS, Azure, Or GCP) and container technologies like Nomad or Kubernetes.
  • Strong experience with performance testing tools like K6, Artillery, Vegeta, Locust etc
  • Effective communication and collaboration skills, capable of working with cross-functional teams and articulating technical concepts to diverse audiences.
  • Familiarity with HashiCorp products and tools is a plus.
  • Exposure to the disaster recovery domain is a plus.#LI-Hybrid
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.