Responsibilities:
- Develop and implement machine learning models and algorithms for classification, regression, clustering, recommendation, and more.
- Build and maintain data pipelines for training and inference workflows.
- Collaborate with data scientists, product managers, and software engineers to integrate AI models into production systems.
- Optimize model performance and scalability for real-time and batch processing.
- Conduct experiments, evaluate model performance, and iterate based on results.
- Stay up to date with the latest research and advancements in AI/ML and apply them to practical use cases.
- Document code, processes, and model behavior for reproducibility and compliance.
Basic Requirements:
1. Programming Languages
- Python: Core language for AI/ML development. Proficiency in libraries like:
- NumPy, Pandas for data manipulation
- Matplotlib, Seaborn, Plotly for data visualization
- Scikit-learn for classical ML algorithms
- Familiarity with R, Java, or C++ is a plus, especially for performance-critical applications.
2. Machine Learning & Deep Learning Frameworks
Experience building models using the following:
- TensorFlow and Keras for deep learning
- PyTorch for research-grade and production-ready models
- XGBoost, LightGBM, or CatBoost for gradient boosting
- Understanding of model training, validation, hyperparameter tuning, and evaluation metrics (e.g., ROC-AUC, F1-score, precision/recall).
3. Natural Language Processing (NLP)
Familiarity with:
- Text preprocessing (tokenization, stemming, lemmatization)
- Vectorization techniques (TF-IDF, Word2Vec, GloVe)
- Transformer-based models (BERT, GPT, T5) using Hugging Face Transformers
- Experience with text classification, named entity recognition (NER), question answering, or chatbot development.
4. Computer Vision (CV)
Experience with:
- Image classification, object detection, segmentation
- Libraries like OpenCV, Pillow, and Albumentations
- Pretrained models (e.g., ResNet, YOLO, EfficientNet) and transfer learning
5. Data Engineering & Pipelines
- Ability to build and manage data ingestion and preprocessing pipelines.
- Tools: Apache Airflow, Luigi, Pandas, Dask
- Experience with structured (CSV, SQL) and unstructured (text, images, audio) data.
6. Model Deployment & MLOps
Experience deploying models as:
- REST APIs using Flask, FastAPI, or Django
- Batch jobs or real-time inference services
- Familiarity with:
- Docker for containerization
- Kubernetes for orchestration
- MLflow, Kubeflow, or SageMaker for model tracking and lifecycle management
7. Cloud Platforms
- Hands-on experience with at least one cloud provider:
- AWS (S3, EC2, SageMaker, Lambda)
- Google Cloud (Vertex AI, BigQuery, Cloud Functions)
- Azure (Machine Learning Studio, Blob Storage)
- Understanding of cloud storage, compute services, and cost optimization.
8. Databases & Data Access
Proficiency in:
- SQL for querying relational databases (e.g., PostgreSQL, MySQL)
- NoSQL databases (e.g., MongoDB, Cassandra)
- Big data tools like Apache Spark, Hadoop, or Databricks is a plus
9. Version Control & Collaboration
- Experience with Git and platforms like GitHub, GitLab, or Bitbucket.
- Familiarity with Agile/Scrum methodologies and tools like JIRA, Trello, or Asana.
10. Testing & Debugging
- Writing unit tests and integration tests for ML code.
- Using tools like pytest, unittest, and debuggers to ensure code reliability.
Official notification