Key Responsibilities:
**Leadership and Management:**
Lead, mentor, and manage a team of SREs, fostering a culture of collaboration, innovation, and continuous improvement.
Set strategic direction for the SRE team, aligning with business goals and ensuring the reliability and scalability of Adobe Connect services.
Drive the professional development of team members, providing coaching, feedback, and growth opportunities.
2. **Operational Excellence:**
Oversee the management of Tier-1 monitoring/alerts, incident response, and post-mortem analysis, ensuring timely resolution and learning from incidents.
Develop and implement strategies for improving system reliability, including automation, performance tuning, and capacity planning.
Ensure robust disaster recovery and business continuity plans are in place and regularly tested.
3. **Collaboration and Communication:**
Collaborate closely with engineering, product, and infrastructure teams to ensure seamless integration and deployment of new features and updates.
Act as a key stakeholder in product development, advocating for reliability, scalability, and operational efficiency from the early stages of design.
Communicate effectively with cross-functional teams and executive leadership, providing updates on system performance, reliability metrics, and ongoing projects.
4. **Continuous Improvement and Innovation:**
Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime.
Identify and implement best practices in SRE, staying ahead of industry trends and emerging technologies.
Foster a culture of continuous improvement, encouraging the team to experiment, learn, and iterate on processes and tools.
5. **Resource Planning and Allocation:**
Manage team resources effectively, balancing operational tasks with project work to ensure the team can meet both short-term and long-term objectives.
Participate in hiring, onboarding, and training new team members, ensuring the SRE team is well-equipped to handle the demands of the Adobe Connect platform.
**Qualifications:**
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
10+ years of experience in site reliability engineering, infrastructure engineering, or a related role, with at least 5 years in a leadership or management position.
Proven experience managing large-scale, distributed systems in a cloud environment (AWS, Azure, or GCP).
Strong expertise in automation, monitoring, and incident management tools and practices.
Excellent problem-solving and analytical skills, with a focus on delivering high-quality, reliable services.
Strong communication and interpersonal skills, with the ability to lead and inspire a team.
Experience with Agile methodologies and a solid understanding of DevOps practices.
**Preferred Qualifications:**
Experience working with Adobe Connect or other web conferencing platforms.
Certifications in cloud platforms (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.).
Knowledge of security and compliance standards (ISO 27001, SOC 2, GDPR, etc.).
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.