What You’ll Do:
● Architect and develop full stack solutions for monitoring, logging, and managing Generative AI , machine learning workflows and models.
● Architect, build and deploy well-managed core APIs and SDKs for observability of LLMs and proprietary Foundation Models including training, pre-training, fine-tuning and prompting.
● Work with model and platform teams to build systems that ingest large amounts of model and feature metadata and runtime metrics to build an observability platform and to make governance decisions to ensure ethical use, data integrity, and compliance with industry standards for Gen-AI.
● Partner with product and design teams to develop and integrate advanced observability tools tailored to Gen-AI.
● Leverage cloud-based architectures and technologies to deliver solutions for platform users providing deep insights into model performance, data flow, and system health.
● Collaborate as part of a cross-functional Agile team, data scientists, ML engineers, and other stakeholders to understand requirements and translate them into scalable and maintainable solutions.
● Use programming languages like Python, Scala, or Java
● Leverage continuous integration and continuous deployment best practices, including test automation and monitoring, to ensure successful deployments of machine learning models and application code.
Basic Qualifications:
● Master's Degree in Computer Science or a related field
● Minimum 12 years of experience in software engineering and solution architecture
● At least 8 years of experience designing and building data intensive solutions using distributed computing
● At least 8 years of experience programming with Python, Go, or Java
● Proficiency in observability tools such as Prometheus, Grafana, ELK Stack, or similar, with a focus on adapting them for Gen AI systems.
● Excellent knowledge in Open Telemetry and priority experience in building SDKs and APIs.
● Excellent communication skills, capable of articulating complex technical concepts to diverse audiences and driving cross-functional initiatives.
● Experience developing and deploying ML platform solutions in a public cloud such as AWS, Azure, or Google Cloud Platform
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.