Minimum 3-4 years of work experience as an SRE (not Traditional Production Support) covering integration platforms on cloud-based deployments.
Coding experience in any programming language, particularly for integration tier and middleware.
Working in a 24x7 operations support model for mission critical applications and infrastructure using ServiceNow as the ITSM ticketing tool.
Working with AppDynamics and Splunk for monitoring and setting up observability is key. CI/CD tool chains, setting up and running deployment pipelines and propagating changes on different environments. Maintaining middleware such as Kafka (open source) and MQ as well as application servers (Tomcat).
Maintain Hazelcast Data storage platform clusters and Control M job schedulers.
GCP and private-cloud operational support / administration activities such as provision, capacity management, reliability management, monitoring, restoration, etc.
Kubernetes cluster management, monitoring and remediation. Knowledge of Docker is important.
Automating deployments and scripting self healing workflows based on telemetry.
Define SLIs and configure SLOs, respond to threshold alerts and optimize monitoring capability.
Work with code as well as configuration artifacts to debug and fix issues that may arise.
Knowledge of applying SRE practices to daily operations is key.
Must be inclined to work on proof of concept solutions to optimize reliability such as those incorporating AI models for event correlation and assisted triaging.
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.