TechOps-DE-CloudOps-CLOUD COMPUTING - AWS Infra (7+)
ey | 107 days ago | Pune

Your key responsibilities

  • Lead incident response and coordination for AWS infrastructure issues, ensuring timely troubleshooting and resolution.
  • Act as the primary escalation point for critical incidents that require in-depth analysis and coordination with engineering teams.
  • Own and execute SOPs and runbooks to manage cloud infrastructure-related requests, issues, and remediation activities.
  • Review and refine incident handling processes to enhance troubleshooting efficiency within the AHD team.
  • Conduct log analysis and system diagnostics using various tools and ITSM tool’s work notes.
  • Ensure proper access management & request fulfilment, including IAM role validation, security configurations, and VPC networking support is provided by the team
  • Monitor and troubleshoot containerized environments and infrastructure components.
  • Provide technical mentorship and training for junior engineers, improving incident handling and automation skills.
  • Work closely with product teams to identify recurring issues, document knowledge base updates, and drive SOP/process standardization.
  • Participate in shift handovers and governance meetings, ensuring knowledge transfer and clear communication of ongoing issues.
  • Provide guidance to junior engineers in handling cloud infrastructure issues and best practices
     

Skills and attributes for success

  • Strong technical leadership and escalation management skills.
  • Deep expertise in AWS infrastructure operations, including EC2, IAM, VPC, and security groups.
  • Hands-on experience with Kubernetes (EKS), Helm, and container orchestration.
  • Strong log analysis and troubleshooting experience using AWS CloudWatch and OpenTelemetry (OTEL).
  • Experience working with ITSM tools.
  • Ability to analyse trends, identify recurring issues, and propose automation-driven solutions.
  • Excellent communication and stakeholder coordination skills to work with product teams.
  • Experience in refining SOPs, troubleshooting guides, and runbooks for operational efficiency.

 

To Qualify for the Role, You Must Have

  • 7+ years of experience in cloud infrastructure operations, incident management, and technical support.
  • Deep understanding of AWS security principles, IAM policies, and encryption mechanisms.
  • Experience troubleshooting and managing Kubernetes (EKS), Helm, and containerized workloads.
  • Experience working with ITSM tools.
  • Strong problem-solving skills with experience in handling major incidents and leading root cause analysis (RCA).
  • Willingness to work in a 24x7 rotational shift-based support environment.
    • No location constraints; ability to collaborate with global teams.
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.