Understanding project KPIs, SLI's, SLO's, MTTD, MTTR, Error budgets, Chaos engineering and eliminating TOILs by automation
Exploring observability tools and creating/implementing dashboards
Run the production environment by monitoring availability and taking a holistic view of system health
Incident Management: Knowledge in handling incidents, participating in blameless postmortem, performing root cause analysis, and implementing post-incident reviews.
Develop scripts to reduce toil and automate repetitive tasks, issues resolution scripting.
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.