A typical day will involve:
- Providing structure and supporting release processes, suggesting and making improvements where possible
- Supporting the clear communication and frequent update of incident status to other teams and customers
- Providing technical expertise and input to establish the risk tolerance of products and services
- Supporting the maintenance of services once they are live by measuring and monitoring availability, latency, and overall system health
The skills you'll need
We’re looking for someone with at least five years of experience as a Site Reliability Engineer or in DevOps role . You’ll need experience of using a data driven and scientific approach to fact finding. We’ll also look for financial services knowledge, and the ability to identify wider business impact, risk and opportunity, and make connections across key outputs and processes.
You'll also need:
- Good knowledge and experience in scripting and programming languages such as Python and Bash
- Experience with cloud platforms such as AWS, Azure, Google Cloud and containerization technologies like Docker and Kubernetes
- Strong experience with automation and configuration management tools such as Ansible, Terraform, and Chef
- Strong knowledge of deploy and release services, automation, and troubleshooting
- Strong communication skills with the ability to proactively engage with a wide range of stakeholders
Official notification