Senior Consultant (NM+)
thoughtworks | 20 days ago | Bangalore

Job responsibilities

  • You will improve site reliability by building mechanisms/architectures that enable fault tolerance and faster median time to respond and median time to detect
  • You will drive the integration of observability automation into the CI/CD pipeline
  • You will handle production incidents, manage incident communication with clients and draft root cause analysis documents
  • You will monitor performance of production systems and improve their scaling to ensure business goals are met within expected SLA and SLO metrics
  • You will work closely with application development teams as advisors on improving system reliability and assisting in implementation for reliability improvements
  • You will improve system observability across multiple facets such as logging and metrics, reducing false alarms to eliminate unnecessary toil and improving process efficiency
  • You will implement chaos engineering practices as necessary to test system reliability, setting up processes for such testing to be done regularly 
  • You have a clear understanding of client goals and business needs and setting direction for site reliability in line with the same, e.g.: Achieving application availability with minimum/no disruption (99.999%) if necessary for business

Job qualifications

Technical Skills

  • You have hands-on experience in programming and scripting languages such as Python, Go or Bash
  • You have a good understanding of at least one Public Cloud, e.g.: AWS, Azure or GCP 
  • You have had exposure to observability tools such as Grafana, Datadog, NewRelic, ELK Stack, Dynatrace or equivalent and you are proficient in using data from these tools to dissect and identify root causes of system and infrastructure issues 
  • You are familiar with DevOps and GitOps practices 
  • You have a good knowledge of container-based architecture and orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.
  • You understand technical architecture and modern design patterns, including microservices, serverless functions, NoSQL and RESTful APIs, with experience in fixing bugs, analyzing logs, building metrics and operational dashboards
  • You are familiar with creating infrastructure resources for improving reliability of system that follows Cloud’s Well Architected Framework principles: Reliability, security, cost optimization, performance efficiency and operational
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.