SRE, Staff (8+)

synopsys | 213 days ago | Hyderabad

Key Roles & Responsibilities

Discover, design, implement changes to existing IT infrastructure with the focus of improved reliability, performance, and standardization.
Collaborate with Engineering and business units to translate customer, business, and technical requirements into SRE practices and enhancements.
Ensure efficient resource utilization and continuously improve processes leveraging automation and internal tools resulting in enhanced service delivery, maturity, and scalability.
Troubleshoot production issues providing root cause analysis and designing solutions to prevent future occurrences.
Monitoring of services and creating intelligent alarming for quicker incident detection and resolution.
Maintain vulnerability management processes and policies using a risk-based priority methodology.
Collaborate with the various teams and platform owners on all vulnerability management and reporting.
Strategically apply architectural and infrastructure disciplines to solve business problems.
Participate in off-hours maintenance activities and be part of on-call rotation schedule.

Required Skills

Extensive experience with a wide range of infrastructure technologies, such as but not limited to Linux, Windows, High-performance computing, storage platforms, networking, cloud computing, cloud services (IaaS, PaaS, SaaS, etc.), virtualization, OpenStack, containerization, and orchestration technologies (e.g., Docker, Kubernetes).
Deep understanding of IT infrastructure related services and their dependencies required to troubleshoot issues and define mitigations.
Solid experience with the administration, security hardening, and performance tuning of Linux and Windows OS. In-depth knowledge of CIS benchmarking standards.
Experience with developing service level indicators and objectives, instrumenting software, and building alerts.
An understanding of software engineering fundamentals with experience developing software with a team of engineers. Strong experience in the practice of testing.
Experience with the operations, administration, and development of orchestration systems such as Kubernetes, ECS, Mesos.
Passion for tracking down technical root causes of distributed systems, and software.
Experience with ITAM, Service Mapping, and CMDB (service-now)
Strong technical foundation, with the ability to engage deeply on technical topics related to data centre and cloud infrastructure, software reliability, and operational practices.
Proficiency in ITIL (Information Technology Infrastructure Library) processes and frameworks
Service availability-oriented mindset with a pro-active approach to problem solving. An ideal candidate should be able to develop automated solutions to prevent recurring problems.
Possesses the ability and willingness to challenge the status-quo and optimize current processes and procedures.

Experience & Education

Masters/bachelor’s degree with minimum of 8+ years of experience in IT infrastructure & operations with 4+ years in an SRE role
Implementation experience in infra-automation tools and frameworks like GitHub, Jenkins, Terraform (IaC), Ansible, Shell scripting.
Hands on experience with one or more of Java/Python/Go/NodeJS languages.
Knowledge of SDLC, Agile processes and CI/CD tools.
Well versed in ITIL process including incident, request and change management.
Good understanding of cloud, automation, networking and SIEM tools.
Excellent verbal and written communication skills
Excellent problem-solving skills and ability to work through issues and challenges.

Official notification