Site Reliability Engineer III (4+)
khoros | 179 days ago | Remote

Responsibilities :

  •  Manage environments on the Cloud. 
  • Monitor, troubleshoot, and resolve issues related to infrastructure, applications, and services.
  • Monitor availability and maintain the systems in good health.
  • Implement automation tools and processes to improve efficiency and reliability.
  • Participate in on-call rotation and respond to incidents promptly.
  • Continuously evaluate and improve our systems and processes to enhance reliability and performance.
  • Document runbooks and procedures.
  • Work closely with 1st Level support groups as well as Development groups.
  • To follow departmental change management procedures in defining, planning, and implementing change so that service disruption is minimized and adherence to Service Level Agreements is ensured.
  •  Perform the Incident root cause analysis.
  • Have the ability to run with projects/issues solo and work in a team environment. 
  • Be a Team Player – work in a collaborative team-oriented environment, share information, respect diverse ideas, and interact with customers and, partner with cross-functional and remote teams.
  • Be Curious & Innovative – continuously update yourself with next-generation technology and development tools, and contribute to process development practices. Evaluate new technologies and software products to determine the feasibility and desirability of incorporating capabilities within the company's products.
  • Be Agile – with a strong sense of urgency and a desire to work in a fast-paced, dynamic environment to deliver solutions against strict timelines.

Requirements:

  • 4+ years experience as an SRE in fast-paced and high-traffic environments.
  • Experience deploying and maintaining applications in any one of the clouds (AWS- must have, AZURE/ GCP- good to have)
  • Working knowledge of Linux and Windows operating systems
  • Working knowledge with any of the scripting languages - Shell, bash, python, PowerShell
  • Understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Working knowledge with Jenkins, Ansible, Terraform, and ArgoCD (good to have)
  • Administration of databases (MS SQL, MongoDB, etc)
  • Extensive experience with some monitoring, logging, and observability tools ( Sumo, DD, AWS CloudWatch, AWS X-Ray, New Relic, Splunk, etc.)
  • Ability to debug issues and solve problems
  • Excellent problem-solving and communication skills.
  • Ability to work independently and collaborate effectively in a team environment.
  • Familiarity with agile development methodologies is a plus.
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.