Deploy and operate security solutions and supporting infrastructure in cloud and datacenter environments in support of internal customer security needs and FedRAMP requirements
Develop and automate Security tasks that span from Security Operations to Infrastructure as Code in support of InfoSec initiatives
Manage the availability, capacity and configuration of InfoSec’s mission critical applications and services
Define, measure and monitor SLAs & SLOs for systems and services with the objective of achieving and exceeding availability and reliability goals
Manage and streamline monitoring systems to enhance observability and enable proactive identification of issues.
Coordinate and manage incidents, upgrades and changes for InfoSec’s applications and services
Drive post-incident analysis with partner teams and/or vendors to identify root cause and ensure preventative measures are implemented promptly
Assist in Security Incident investigations
Manage a scalable and highly available solution for security logging and drive efforts of logging onboarding for increased security visibility
Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps
Develop and maintain documentation and runbooks to reduce MTTR and inform future automation development
Work cross functionally across global time-zones requiring flexible work hours
Participate in 24/7 on-call rotations
Experience You’ll Need:
Bachelor degree in Computer Science or related field or equivalent experience
8+ years experience in site reliability engineering, deploying, managing and troubleshooting security systems across the stack (on-prem and cloud)
Strong operational mindset focused on availability, reliability, performance and continuous improvement of systems and services
Operational knowledge of Linux and Windows systems
Experience with Terraform, Ansible, Vault, Prometheus, Grafana and Github
Proficiency in any scripting language (Python, PowerShell, Perl, Ruby, shell, etc.)
Working experience in GCP, AWS or Azure
Experience collaborating with internal customers to establish strong requirements, prioritize work based on outcomes that drive operational effectiveness