Senior Site Reliability Engineer (8+)
lilly | 209 days ago | Bangalore

Qualifications:

  • Bachelor's or master's degree in computer science, Engineering, or related field
  • 8 to 10+ years relevant experience.

 

Must-Have Skills and Experience

  • Cloud Management: Utilize extensive AWS knowledge to manage and optimize cloud-based solutions, ensuring scalability, security, and cost-effectiveness.
  • Container Orchestration: Deploy and manage Kubernetes clusters to streamline the deployment and scaling of applications. Leverage Kubernetes' capabilities to enhance the reliability and agility of our services.
  • DevOps Practices: Implement and advocate for best DevOps practices, including continuous integration (CI), continuous delivery (CD), and infrastructure as code, to improve the efficiency and reliability of the development lifecycle. Manage code repositories using GitHub and automate workflows with GitHub Actions. Ensure seamless integration and deployment processes through automated pipelines.
  • Monitoring and Observability: Monitor application performance, detect anomalies, and ensure system health using tools like Splunk, AppDynamics, Datadog, New Relic, and open-source tools like Prometheus, Grafana, and Jaeger.
  • Scripting and Automation: Develop and maintain scripts using Python or any other scripting languages to automate routine tasks, enhance monitoring, and improve system performance and reliability.
  • Experience in L1 & L2 Support: Provide expert-level support for applications, ensuring timely and efficient resolution of issues. Ability to handle first and second-level support issues efficiently. Maintain and enhance the stability and performance of applications across various environments. Apply ITIL knowledge to streamline processes and improve service management.
  • Incident and Problem Management: Troubleshoot incidents, and problems to ensure seamless operations. Quickly resolve technical issues and identify root causes to prevent future incidents.
  • Documentation Skills: Keeping thorough records of issues, fixes, and maintenance tasks for future reference. Ability to create and maintain detailed Runbooks.
  • Experience with disaster recovery (DR) and business continuity planning (BCP).
  • Expertise in creating and maintaining SNOW dashboards and reporting.
  • Proven track record of applying Site Reliability Engineering (SRE) principles.
  • Knowledge of Tomcat or Any other application servers.
  • Knowledge of Linux and shell scripting
  • Effective prioritization skills considering urgency and business impact.
  • Proactive mindset in addressing and resolving issues.
  • Exhibit a strong sense of ownership and accountability for all tasks and responsibilities.
  • Work as an engineer specializing in Kubernetes and Amazon Web Services on a team of full stack software developers to develop and maintain software platforms and DevOps processes
  • Guide and collaborate with internal application teams to deploy solutions on a custom Cloud Deployment Platform
  • Improve and maintain reusable pipeline templates and patterns for automated deployment of cloud infrastructure and code
  • Develop and support high-quality automation workflows inside and outside the cloud platform that are appropriate for business and technology strategies
  • Monitor and troubleshoot the software delivery process
  • Work with software developers and operations engineers to improve the software delivery process
  • Stay up to date on the latest DevOps and ITIL practices and technologies
  • Strive to provide internal customers with excellent customer service
  • Effectively contribute to the communication of platform health, risks, and issues to the program partners, stakeholders, and management teams
  • Resolve most conflicts between prioritization and scope independently but intuitively raise complex or consequential issues to senior management
  • Be a self-starter, able to come up with solutions to problems and complete those solutions while coordinating with other teams
  • Work in a modern Agile/Kanban environment to deliver customer value with regular cadence
  • Experience working with teams across organizational and geographic boundaries and multiple levels within the organization 
  • Excellent proactive oral and written communication skills 
  • Experience in multiple common tech languages

 

Good-to-Have Skills and Certifications

  • ITIL Foundation certificati Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.