Job Responsibilities
· Automate the deployment of logging, metrics, and monitoring services through configuration management utilizing Puppet.
· Address and resolve production incidents by applying Linux administration and engineering expertise.
· Lead projects from inception to completion, including designing technical solutions, managing timelines, and executing deliverables.
· Design and implement metrics dashboards and alert criteria to effectively monitor and scale services.
· Participate in a week-long on-call rotation in collaboration with team members.
· Assist development teams in enhancing their logging and metrics collection processes.
· Demonstrate the ability to manage on-call rotations every few weeks.
Typical Qualifications
· Possess 5 to 8 years of prior experience in a production environment, exhibit strong system administration and DevOps skills for managing services within a Linux environment.
· Observability: Strong understanding of observability concepts, including monitoring, logging, tracing, and alerting across distributed systems.
· OpenTelemetry: Hands-on experience in implementing distributed tracing and metrics using OpenTelemetry SDKs and collectors.
· OpenSearch or Elasticsearch: Proficiency in managing, querying, and optimizing search and analytics engines like OpenSearch or Elasticsearch.
· Jaeger: Practical experience in configuring and using Jaeger for distributed tracing in microservices environments.
· Proficient in programming with experience writing and maintaining scripts in the following languages: Bash, Ruby, Python, Perl, C++, Java, and Golang.
· Experience developing Infrastructure as Code utilizing Terraform and CloudFormation.
· Display adaptability and flexibility in response to changing environmental and business demands.
Additional Qualifications
· Demonstrated experience in managing production server fleets at a scale of thousands.
· Subject matter expertise in relevant technologies, including FluentD, Kafka, Elasticsearch, Graphite, Clickhouse, Prometheus, Grafana, Graylog, Terraform, CloudFormation, Docker, Jenkins, and Git.
· Exposure to Amazon Web Services (AWS) for deploying, managing, and scaling applications, with a foundational understanding of AWS services, architecture, and best practices.
· Proficient in using protocol analyzers such as tcpdump and Wireshark.
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.