Site Reliability Engineer II (8+)
bankofamerica | 239 days ago | Hyderabad

We are looking for OpsSite Reliability Engineer (SRE) for Hybrid Cloud Container platform

running on Openshift 4.X. The individuals in this role will develop SRE tools and automations for day-to-day proactive maintenance and operations of hybrid cloud container platform.

Should provide end-to-end support coverage for the platform & work on build, upgrade and maintain OCP clusters. Should have understanding or exposure of agile as well as ITSM incident/change/request management processes. Experience of implementing platform resiliency, self-healing, health & compliance dashboards, automation for day-to-day operational tasks over hybrid cloud for enterprise class production grade environment is desired.

Responsibilities

  • Responsible for SRE Support for Container platforms & apply SRE knowledge to identify potential gaps in the observability design or implementation.
  • Work with the clients, Application and development Teams to onboard the applications and integrate with CI/CD platform.
  • Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
  • Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
  • Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
  • Work with monitoring tools and Application Development teams to enhance monitoring capabilities and modify monitoring dashboards for new observability plans created in support of initiatives or continuous improvement efforts.
  • Develop software or system scripts to simplify or eliminate the dependence on human intervention for recurring tasks.
  • Work with Production Support teams to perform knowledge transfer, playbook updates and training for new monitoring capabilities.
  • Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring and to help define solutions to improve system reliability.
  • Develop and maintain a catalog of extensible reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation and operational needs.
  • Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
  • Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
  • Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
  • Responsible for SRE Support for Container platforms & apply SRE knowledge to identify potential gaps in the observability design or implementation.
  • Work with the clients, Application and development Teams to onboard the applications and integrate with CI/CD platform.
  • Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
  • Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabilities in a fully automated and supportable fashion.
  • Implement cluster services to manage On-Prem Bare Metal Open shift cluster deployments and off-prem deployments.
  • Work with monitoring tools and Application Development teams to enhance monitoring capabilities and modify monitoring dashboards for new observability plans created in support of initiatives or continuous improvement efforts.
  • Develop software or system scripts to simplify or eliminate the dependence on human intervention for recurring tasks.
  • Work with Production Support teams to perform knowledge transfer, playbook updates and training for new monitoring capabilities.
  • Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring and to help define solutions to improve system reliability.
  • Develop and maintain a catalog of extensible reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation and operational needs.
  • Be able to provide technical expertise to Configure, Deploy, and Support Bank workloads to securely run and operate in Container Infra (K8s/RedHat Open Shift/AKS).
  • Responsible for engineering of new capabilities to the OpenShift/Container Platforms and delivering those capabi Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.