Site Reliability Engineer (1+)
amgen | 205 days ago | Hyderabad

Roles & Responsibilities:

  • Ensure high system reliability and uptime.
  • Develop and maintain monitoring systems.
  • Lead incident response and root cause analysis.
  • Automate repetitive tasks for efficiency.
  • Perform capacity planning and resource scaling.
  • Lead infrastructure as code (e.g., Terraform, Kubernetes).
  • Collaborate with development and operations teams.
  • Maintain clear documentation and share knowledge.
  • Optimize system and application performance.
  • Ensure security and compliance standards are met.
  • Define, measure, and monitor Service Level Objectives (SLOs) and Service-Level Agreements (SLAs) to align with business goals.
  • Drive continuous process and system improvements.
  • Define guidelines, standards, strategies, security policies and organizational change policies to support the Data Lake

What we expect of you

Basic Qualifications and Experience:

  • Master’s degree in computer science or engineering field and 1 to 3 years of relevant experience OR
  • Bachelor’s degree in computer science or engineering field and 3 to 5 years of relevant experience OR
  • Diploma and Minimum of 8+ years of relevant work experience

Must-Have Skills:

  • Proficiency in programming/scripting (Python, Java).
  • Experience in Linux/Unix system administration.
  • Experience with cloud platforms (AWS, Databricks, Azure, Snowflake).
  • Proficiency in containerization and orchestration (Docker, Kubernetes).
  • Knowledge of Infrastructure as Code (Terraform, Ansible).
  • Familiarity with monitoring and logging tools (Prometheus, Grafana).
  • Understanding of CI/CD pipelines (Jenkins, GitLab CI/CD).
  • Strong networking knowledge and troubleshooting skills.
  • Understanding of security principles and compliance.
  • Familiarity with database management (SQL and NoSQL).
  • Strong troubleshooting and debugging skills.
  • Experience in performance optimization.
  • Experience with backup and storage solutions.

Good-to-Have Skills:

  • Familiarity with the use of AI for development productivity, such as GitHub Copilot, Databricks Assistant, Amazon Q Developer or equivalent.
  • Knowledge of Agile and DevOps practices.
  • Skills in disaster recovery planning.
  • Familiarity with load testing tools (JMeter, Gatling).
  • Basic understanding of AI/ML for monitoring.
  • Knowledge of distributed systems and microservices.
  • Data visualization skills (Tableau, Power BI).
  • Strong communication and leadership skills.
  • Understanding of compliance and auditing requirements.

Soft Skills:

  • Excellent analytical and solve skills
  • Excellent written and verbal communications skills (English) in translating technology content into business-language at various levels
  • Ability to work effectively with global, virtual teams
  • High degree of initiative and self-motivation
  • Ability to handle multiple priorities successfully
  • Team-oriented, with a focus on achieving team goals
  • Strong problem-solving and analytical skills.
  • Strong time and task leadership skills to estimate and successfully meet project timeline with ability to bring consistency and quality assurance across various projects.
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.