What you’ll do:
Platform Maintenance:
Monitor the health and performance of our GCP-based machine learning infrastructure, including compute instances, storage, and networking.
Troubleshoot and resolve issues related to resource allocation, deployment, and configuration of ML models and pipelines.
Collaborate with DevOps teams to implement automated deployment and testing processes for machine learning solutions.
Incident Management:
Triage and resolve support requests related to our ML platform infrastructure.
Perform root cause analysis to identify and prevent future problems.
Develop and maintain documentation on incident resolution procedures.
Performance Optimization:
Investigate and address performance bottlenecks in our ML environment on GCP.
Implement monitoring and alerting systems to proactively identify potential issues.
Collaborate with data science teams to optimize resource utilization and cost efficiency.
Platform Upgrades and Enhancements:
Stay up-to-date on new GCP services and releases relevant to machine learning.
Plan and execute platform upgrades and enhancements to support evolving ML needs.
Work with data science and engineering teams to assess the impact of new technologies and services on existing workflows.
What experience you need:
BS degree in Computer Science or related technical field involving coding with
7+ years of experience in Cloud Platform Engineer with good knowledge on VI Platform Engineering
Strong knowledge of GCP services, particularly those related to machine learning (e.g., Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery).
Proficiency in Python and experience with scripting languages.
Experience with containerization (e.g., Docker) and orchestration technologies (e.g., Kubernetes).
Familiarity with cloud monitoring and logging tools (e.g., Cloud Monitoring, Cloud Logging).
Experience with DevOps practices and tools (e.g., CI/CD pipelines, Git).
Ability to quickly diagnose and resolve complex technical issues.
Strong analytical and troubleshooting skills.
Proactive approach to identifying and preventing potential problems.
Ability to effectively communicate technical concepts to both technical and non-technical stakeholders.
Excellent written and verbal communication skills.
Ability to collaborate effectively with cross-functional teams.
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.