Senior Site Reliability/DevOps Engineer (7+)
equifax | 164 days ago | Trivandrum

What you’ll do

  • Architecture and Design: Participate in the design and architecture of highly scalable, resilient, and secure systems on Kubernetes. Contribute to the definition of SRE principles and best practices.

  • Automation: Develop and maintain automation frameworks for infrastructure provisioning, deployment, monitoring, and incident response using tools like Terraform, Ansible, Puppet, Chef, or similar.

  • Monitoring and Alerting: Design and implement comprehensive monitoring and alerting systems to proactively identify and resolve issues. Develop and maintain dashboards to track key performance indicators (KPIs).

  • Incident Management: Lead incident response efforts, conducting thorough post-incident reviews to identify root causes and implement preventative measures.

  • Capacity Planning: Proactively identify and address capacity constraints to ensure optimal system performance and availability.

  • Collaboration: Work closely with engineering, product, and security teams to ensure seamless collaboration and alignment on system requirements and priorities.

  • Mentorship: Mentor and guide junior SRE/DevOps engineers, fostering a culture of continuous learning and improvement.

  • On-call Rotation: Participate in a rotating on-call schedule to provide 24/7 support for critical systems.

  • Security: Contribute to the security posture of our systems by implementing security best practices and participating in security audits and reviews.

  • Performance Optimization: Identify and resolve performance bottlenecks, optimizing system performance and resource utilization.

What experience you need   

  • 7+ years of experience as an SRE, DevOps Engineer, or in a similar role.

  • Deep understanding of cloud platforms such as GCP (AWS and Azure are a plus)

  • Extensive experience with containerization technologies like Docker and Kubernetes.

  • Proven experience with configuration management tools (e.g., Terraform, Ansible, Puppet, Chef).

  • Strong scripting skills (e.g., Python, Go, Bash, Shell).

  • Experience with monitoring and logging tools (e.g., DataDog, Prometheus, Grafana, Datadog, ELK stack).

  • Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI).

  • Experience with incident management and post-incident reviews.

  • Excellent problem-solving and troubleshooting skills.

  • Strong communication and collaboration skills.

  • Bachelor's degree in Computer Science or a related field; equivalent experience considered.

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.