Develop and manage deployment processes for machine learning models, ensuring seamless integration into production environments
Design and implement automated CI/CD pipelines for ML workflows, adhering to company standards and best practices
Create and maintain monitoring tools to track model performance, reliability, and accuracy in production
Optimize infrastructure for model training, testing, and deployment, including the development of template scripts and automation to accelerate the development process
Collaborate with data scientists, data engineers, and platform engineers to streamline ML operations and integrate new AI technologies into the platform ecosystem
Ensure security and compliance of ML models and workflows with industry standards, regulations, and company governance frameworks
Research and integrate best practices and new technologies in MLOps to improve efficiency and effectiveness
Assist in the creation and implementation of rigorous evaluation and validation processes for ML models, focusing on automation of validation scripts for deployment
Contribute to the development and maintenance of training materials and user guides for the AI platform
Experience and Skills Required:
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field
6+ years of experience in software engineering or DevOps, including at least 2-3 years of hands-on experience with machine learning operations or AI platform engineering
Demonstrated experience in deploying and maintaining machine learning models in production environments
Strong programming skills in Python and proficiency with shell scripting
Extensive experience with CI/CD tools (e.g., Jenkins, GitLab CI, or Azure DevOps)
In-depth knowledge of containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes)
Familiarity with cloud platforms (e.g., AWS, Azure, or GCP) and their ML-specific services
Practical experience with ML frameworks such as TensorFlow, PyTorch, or scikit-learn
Strong understanding of data pipelines, ETL processes, and data storage solutions
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
Excellent problem-solving skills and ability to optimize complex systems
Strong communication skills and ability to work effectively in a collaborative environment
Knowledge of data governance, security best practices, and compliance regulations related to AI/ML
Experience with version control systems (e.g., Git) and ML model versioning tools (e.g., MLflow, DVC)