Develop, deploy, and manage scalable, reliable, and secure infrastructures across on-premises environments and cloud platforms such as AWS, Azure, and Google Cloud Platform, including multi-cluster and multi-regional Kubernetes environments.
Develop and maintain automation scripts (Python, Bash, Shell, etc.) and automation tools (GitLab, Hashicorp Terraform, Hashicorp Vault, etc.) to streamline & improve deployment, monitoring, and management processes, using Infrastructure as Code (IaC).
Define and maintain infrastructure automation principles, collaborating with infrastructure teams to embrace & cultivate continuous integration and continuous delivery/deployment (CI/CD).
Implement and integrate with monitoring and observability solutions, such as AIOps, to proactively detect and respond to system issues.
Analyze system performance and implement improvements to enhance cost efficiency and user experience.
Participate in on-call rotations to ensure 24/7 system availability.
Maintain detailed documentation (HLDs and LLDs) of infrastructure, processes, and procedures to facilitate learning and operational continuity.
As a Staff Engineer you will act as a Technical Lead on quarterly prioritized features, supporting the project managers and coordinate with IT teams, scrum masters, and the wider business to deliver projects.
Adopt a continuous learning mentality to stay updated with industry trends and new technologies to improve operational performance.
Required Skills and Experience:
Extensive knowledge of cloud platforms (AWS, Azure, or GCP), containerization technologies (Docker, Kubernetes, Rancher, and Cloudbees, etc.), automation tools (Terraform, Ansible), and monitoring solutions (Prometheus, Grafana).
Strong scripting and programming skills (Bash, Python, and Go).
Experience in deploying, maintaining, and integrating Hashicorp Vault, GitLab, Jenkins, Ansible and Terraform Enterprise platforms with automation pipelines.
Excellent analytical and problem-solving abilities with a proactive approach to identifying and resolving issues.
Experience in a DevOps or SRE or Platform Engineering role, with a confirmed focus on hybrid-infrastructure.
Good communication and collaboration skills, with the ability to work efficiently in a team-oriented environment.
Experience working in Agile delivery environment integrated with Atlassian Jira and Confluence applications.