Key Responsibilities
Platform Engineering & Leadership
- Own the Azure platform architecture, roadmap, and engineering standards.
- Define golden paths, reusable platform services, and self‑service capabilities.
- Act as the technical authority and escalation point for platform and CloudOps integrations.
Azure Infrastructure & Infrastructure‑as‑Code
- Architect and govern enterprise‑scale Azure infrastructure using Terraform (IaC‑first).
- Ensure consistent implementation of landing zones, networking, identity, and governance.
- Review Terraform modules for security, scalability, and operational readiness.
CI/CD, Automation & Python Engineering
- Define and lead Azure DevOps CI/CD standards using YAML pipelines.
- Build Python‑based automation, including:
- Orchestration and workflow engines
- Platform utilities, health checks, and guardrails
- Alert‑ and event‑driven remediation
- Integrate automation into CI/CD and operational workflows.
API Integration & Platform Connectivity (Core Focus)
- Serve as the API Integration Specialist, owning integrations across:
- Azure services (ARM, Azure Monitor, Log Analytics, AKS, Key Vault)
- Azure DevOps (pipelines, repos, artifacts, boards)
- Observability platforms (Grafana, Prometheus, Loki, Azure Monitor)
- ITSM / Operations tools (ServiceNow, incident and ticketing systems)
- Design API‑driven workflows using Python (REST, webhooks, event‑driven).
- Standardize authentication, authorization, and secrets management.
- Ensure integrations are secure, resilient, observable, and automation‑ready.
Kubernetes & Cloud‑Native Platforms
- Provide technical leadership for AKS platform architecture and operations.
- Define standards for networking, security, scaling, and lifecycle management.
- Guide teams on Helm‑based deployments and cloud‑native patterns.
AIOps & Observability
- Own the AIOps strategy for Azure CloudOps, including:
- Alert correlation and noise reduction
- Anomaly detection and predictive insights
- API‑driven and Python‑based automated remediation
- Standardize observability data for AIOps consumption.
- Continuously improve MTTR, availability, and operational efficiency.
Cross‑Team Leadership
- Mentor and lead platform and Azure infrastructure engineers.
- Partner with Security, FinOps, SRE, and CloudOps teams.
- Produce SOPs, runbooks, and API contracts suitable for L1/L2 and AIOps automation.
Business‑as‑Usual (BAU) & Rotational Support
- Own BAU platform and CloudOps operations, ensuring stability and reliability.
- Participate in rotational on‑call support, acting as Lead escalation for:
- Production incidents and service degradation
- AKS, Azure infrastructure, and CI/CD failures
- Drive incident triage, RCA, and post‑incident reviews, feeding outcomes into:
- Platform improvements
- Automation and AIOps use cases
- SOPs and runbooks
- Progressively automate BAU operations using Python, APIs, and CI/CD workflows.
- Partner with L1/L2 teams to shift left via automation and AIOps.
Required Skills & Experience
- Deep expertise in Microsoft Azure (compute, networking, identity, governance).
- Strong hands‑on experience with Terraform at enterprise scale.
- Advanced experience with Azure DevOps CI/CD and YAML pipelines.
- Strong experience with AKS / Kubernetes.
- Strong Python proficiency for automation and integrations.
- Strong experience with REST APIs, webhooks, and event‑driven architecture Official notification