• As an Observability Engineer with a specialization on the Grafana stack, you will play a critical role in making the internal state of the market's infrastructure and services visible to stakeholders for troubleshooting, performance analysis, capacity planning, and reporting with a focus on telemetry solutions.
• You will develop platforms and tooling to enable developers and operators to efficiently trace performance problems to their source and map their application performance to business objectives using traces.
• You will assist teams in instrumenting their applications and systems to generate and utilize traces.
• You will engineer the standardization and adoption of observability tools for the Infrastructure departments including Platform, Database, Reliability, and Cloud Operations teams, as well as developer teams.
Key Responsibilities
• Design and build an observability infrastructure for all engineering teams to consume.
• Develop and improve instrumentation for monitoring and logging the health and availability of services.
• Design and develop tools for metric collection, analysis, and reporting.
• Educate and lead efforts to improve observability among all engineering teams.
• Work with teams to enable an effective and pleasant on-call experience.
• Identify and collect the appropriate measurements, and synthesize the correct queries, to show intuitive and insightful visualizations which characterize the behavior of complex systems.
• Build a metrics pipeline with end-to-end latency under 5 minutes.
• Integrate logs with time series data for event correlation.
• Help us unlock the power of distributed tracing.
• Proactively monitor systems, networks, and applications to provide input in improving the stability, security, efficiency, and scalability of systems.
Our ideal candidate would have:
• Familiarity with the Grafana tech stack: Loki, Grafana, Tempo, Mimir/Prometheus
• In-depth experience designing at-scale monitoring and logging for corporate infrastructure services.
• 5 years experience working in Monitoring / Observability / SRE / DevOps / Performance Tuning.
• Experience working with cloud infrastructures, particularly Kubernetes and AWS.
• Experience with Git/version control solutions
• Experience with programming languages, primarily Go, Rust, Java, Python
• Experience with CI/CD pipelines like Azure Pipelines, Jenkins
• Expert-level experience in monitoring and logging technologies, both open source
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.