We are looking for Ops/ Site Reliability Engineer (SRE) for Hybrid Cloud Container platform
running on Openshift 4.X. The individuals in this role will develop SRE tools and automations for day-to-day proactive maintenance and operations of hybrid cloud container platform.
Should provide end-to-end support coverage for the platform & work on build, upgrade and maintain OCP clusters. Should have understanding or exposure of agile as well as ITSM incident/change/request management processes. Experience of implementing platform resiliency, self-healing, health & compliance dashboards, automation for day-to-day operational tasks over hybrid cloud for enterprise class production grade environment is desired.
Responsibilities
- Responsible for SRE Support for Container platforms & apply SRE knowledge to identify potential gaps in the observability design or implementation.
- Work with the clients, Application and development Teams to onboard the applications and integrate with CI/CD platform.
- Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
- Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
- Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
- Work with monitoring tools and Application Development teams to enhance monitoring capabilities and modify monitoring dashboards for new observability plans created in support of initiatives or continuous improvement efforts.
- Develop software or system scripts to simplify or eliminate the dependence on human intervention for recurring tasks.
- Work with Production Support teams to perform knowledge transfer, playbook updates and training for new monitoring capabilities.
- Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring and to help define solutions to improve system reliability.
- Develop and maintain a catalog of extensible reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation and operational needs.
- Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
- Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
- Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
- Responsible for SRE Support for Container platforms & apply SRE knowledge to identify potential gaps in the observability design or implementation.
- Work with the clients, Application and development Teams to onboard the applications and integrate with CI/CD platform.
- Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
- Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
- Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
- Work with monitoring tools and Application Development teams to enhance monitoring capabilities and modify monitoring dashboards for new observability plans created in support of initiatives or continuous improvement efforts.
- Develop software or system scripts to simplify or eliminate the dependence on human intervention for recurring tasks.
- Work with Production Support teams to perform knowledge transfer, playbook updates and training for new monitoring capabilities.
- Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring and to help define solutions to improve system reliability.
- Develop and maintain a catalog of extensible reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation and operational needs.
- Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
- Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabi Official notification