Site Reliability Lead (8+)
veeva | 88 days ago | Hyderabad

What You’ll Do

  • Head up a team of engineers, mentor, and provide onsite leadership
  • Rapidly build new applications on an existing, robust enterprise platform
  • Build new cloud infrastructure from scratch following the best practices in software development
  • Drive new features and improvements in a fast-changing environment
  • Partner with product management, design, and QA to deliver cutting-edge solutions and direct value to our customers
  • Work on multiple layers of our stack including backend (primary), front-end, and Infrastructure
  • Drive new features and improvements in a fast-changing environment
  • Build tools and automation that eliminate work and reduce the time it takes to resolve an issue
  • You want to make the system better every day and are self-driven to learn all that is necessary to provide full-stack diagnostics and determine the root cause of problems
  • Ensure our platform meets the scalability and reliability needs of our customers
  • During an incident, lead the effort to triage and mitigate. You might need to perform periodic on-call duty if issues are escalated
  • Strategize with engineering teams on complex problems. You know how to support a system that is used by 3M users and can help dev teams make decisions based on recommendations of what will work in production before it ships
  • Participate in engineering design reviews of new features. Drive focused initiatives that improve operational efficiency and scalability of the platform
  • Communicate effectively with engineering teams, and describe problems succinctly with sufficient detail that you can hand off an ongoing problem to another team or a peer for completion. Engage in real-time communication during outages with both technical and non-technical audiences

Requirements

  • 8+ years experience in Java, preferably at an enterprise cloud software company
  • Proven ability to write clean, testable, readable code in a team environment
  • Hands-on experience with open-source technologies, such as Spring, MySQL, Hibernate, Solr, Maven, Git, Tomcat, Linux, AWS, Vagrant, Docker, Kubernetes
  • 3+ years of experience in relational databases with a mastery of SQL
  • Demonstrated history of incident management and leadership ability
  • Experience in handling production outages and root-cause analysis
  • Hands-on operational experience in a high-volume or critical production service environment
  • Effective communication skills across all levels -- whether talking to individual contributors or executives
  • Solid scripting skills; experience with Shell, Bash, Ansible, Python, Go, Ruby, etc.
  • Ability to handle the periodic, on-call duty
  • Fluent in English - both written and verbal
  • We are looking for strong mentors with a proven record of making your team better
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.