Key Responsibilities:
Design and implement scalable backend services and APIs for generative AI applications using microservices architecture and cloud-native patterns.
Build and maintain model serving infrastructure with load balancing, auto-scaling, caching, and failover capabilities for high-availability AI services.
Deploy and orchestrate containerized AI workloads using Docker, Kubernetes, ECS, and OpenShift across development, staging, and production environments.
Develop serverless AI functions using AWS Lambda, ECS Fargate, and other cloud services for scalable, cost-effective inference.
Implement robust CI/CD pipelines for automated deployment of AI services, including model versioning and gradual rollout strategies.
Create comprehensive monitoring, logging, and alerting systems for AI service performance, reliability, and cost optimization.
Integrate with various LLM APIs (OpenAI, Anthropic, Google) and open-source models, implementing efficient batching and optimization techniques.
Build data pipelines for training data preparation, model fine-tuning workflows, and real-time streaming capabilities.
Ensure adherence to security best practices, including authentication, authorization, API rate limiting, and data encryption.
Collaborate with AI researchers and product teams to translate AI capabilities into production-ready backend services.
Required Technical Skills:
Strong experience with backend development using Python, with familiarity in Go, Node.js, or Java for building scalable web services and APIs.
Hands-on experience with containerization using Docker and orchestration platforms including Kubernetes, OpenShift, and AWS ECS in production environments.
Proficient with cloud infrastructure, particularly AWS services (Lambda, ECS, EKS, S3, RDS, ElastiCache) and serverless architectures.
Experience with CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar tools, including Infrastructure as Code with Terraform or CloudFormation.
Strong knowledge of databases including PostgreSQL, MongoDB, Redis, and experience with vector databases for AI applications.
Familiarity with message queues (RabbitMQ, Apache Kafka, AWS SQS/SNS) and event-driven architectures.
Experience with monitoring and observability tools such as Prometheus, Grafana, DataDog, or equivalent platforms.
Knowledge of AI/ML model serving frameworks like MLflow, Kubeflow, TensorFlow Serving, or Triton Inference Server.
Understanding of API design principles, load balancing, caching strategies, and performance optimization techniques.
Experience with microservices architecture, distributed systems, and handling high-traffic, low-latency applications.
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.