Senior DevOps Engineer (NM+)

Cognizant | 8 days ago | Hyderabad

Responsibilities

Apply technical knowledge and problem-solving methodologies to projects of moderate scope with a focus on improving the data and systems running at scale and ensures end to end monitoring of applications
Resolves most nuances and determines appropriate escalation path
Build support Monitor and Automate web product on Private Cloud infrastructure
Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
Drive initiatives to improve the reliability and stability of web Hosting platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology related bottlenecks in your areas of expertise
Collaborates with technical experts key stakeholders and team members to resolve complex problems
Provides comprehensive and ongoing guidance tools and solutions to support the firms growth
Works toward becoming an expert on the applications and platforms under your influence while understanding their interdependencies and limitations
Documents and shares knowledge within your organization via internal forums and communities of practice
Strong knowledge of one or more infrastructure disciplines such as hardware networking terminology databases storage engineering deployment practices integration automation scaling resilience and performance assessments
Experience with multiple cloud technologies with the ability to operate in and migrate across public and private clouds
Drives to develop infrastructure engineering knowledge of additional domains data fluency and automation knowledge
Cloud Exposure - Understanding and working experience and understanding of resiliency scalability observability monitoring etc
Understanding of the Data Objects & Structure and write the queries using SQL based on tickets as needed
Experience as SRE in complex and mission critical applications involving multitude of components of varying technical generations
Deep proficiency in reliability scalability performance security enterprise system architecture toil reduction and other site reliability best practices with the ability to implement these practices within an application or platform
Strong knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform
Strong knowledge and experience in observability monitoring alerting and telemetry collection using tools such as Cloudwatch Grafana Dynatrace Prometheus Splunk etc
Fluency in at least one programming language such as Python Terraform Ansible Java Spring Boot Shell Scripting DotNet etc