What You’ll Do:
Lead the design and implementation of observability tools and dashboards that provide actionable insights into platform performance and health
Leverage Generative AI models and fine tune them to enhance observability capabilities, such as anomaly detection, predictive analytics, and troubleshooting copilot
Build and deploy well-managed core APIs and SDKs for observability of LLMs and proprietary Gen-AI Foundation Models including training, pre-training, fine-tuning and prompting.
Stay abreast of the latest trends in Generative AI, platform observability, responsible AI, and drive the adoption of emerging technologies and methodologies
Collaborate as part of a cross-functional Agile team to create and enhance software that enables state of the art, next generation gen-ai applications
Bring research mindset, lead Proof of concept to showcase capabilities of large language models in the realm of observability and governance which enables practical production solutions for improving platform users productivity.
Basic Qualifications:
Bachelor’s or Master’s degree in Computer Science, Engineering
At least 7 years of experience in machine learning engineering, building data intensive solutions using distributed computing
At least 5 years of hands-on experience with Generative AI models and their application in observability or related areas
At least 8 years of experience programming with Python, Go, or Java
At least 5 years of experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
At least 5 years of experience productionizing, monitoring, and maintaining models
At least 5 years of experience with cloud platforms like AWS, Azure, or GCP
At least 7 years of experience in developing performant, resilient, and maintainable code.
Preferred Qualifications:
Master's or doctoral degree in data science/computer science, electrical engineering, mathematics
8+ years of experience in machine learning, particularly in deploying and operationalizing ML models
8+ years of experience building and evaluating agentic solutions
Familiarity with container orchestration tools like Kubernetes and Docker
Knowledge of data governance and compliance, particularly in the context of machine learning and AI systems
Prior experience in NVIDIA GPU Telemetry and experience in CUDA
Contributed to open source ML software
Authored/co-authored papers, patent on ML techniques, model, or proof of concept
2+ experience in developing applications using Generative AI i.e open source or commercial LLMs
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.