What you will be doing
Understanding project KPIs, SLI's, SLO's, MTTD, MTTR, Error budgets, Chaos engineering and eliminating TOILs by automation
Exploring observability tools and creating/implementing dashboards
Run the production environment by monitoring availability and taking a holistic view of system health
Incident Management: Knowledge in handling incidents, participating in blameless postmortem, performing root cause analysis, and implementing post-incident reviews.
Develop scripts to reduce toil and automate repetitive tasks, issues resolution scripting.
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Implementing various development, testing, automation tools
Setting up tools and required infrastructure
Monitoring and measuring customer experience and KPIs
Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
What you bring:
Experience in supporting Unix/Linux/Windows based application environments
Knowledge of any RDBMS/NoSql
Good knowledge of application support domain
Worked on/with System and Application Monitoring and Observability tools – Splunk, Prometheus, Grafana, Dynatrace.
Experience with 3rd party tools Management.
Hands on experience in preparing PowerShell/Python/Shell script automation.
Exposure to latest SRE, Cloud, DevOps technologies. Also, Knowledge of Containers, Dockers, Kubernetes/OpenShift tools.
Skills in using tools like Terraform & Ansible to automate infrastructure management.
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.