Job Title: Vertex AI Platform Engineer
Location: Alpharetta, GA
share profile on :
Cloud Platform: Google Cloud (Vertex AI)
Job Summary:
We are seeking an experienced Vertex AI Platform Engineer to maintain, optimize, and support our AI/ML infrastructure on Google Cloud. The ideal candidate will have hands-on expertise in Vertex AI services, container orchestration (Kubernetes, Docker), and DevOps automation, ensuring reliable and scalable machine learning operations.
Key Responsibilities:
-
Manage and optimize the Vertex AI platform (notebooks, pipelines, endpoints).
-
Monitor and troubleshoot performance, configuration, and scheduling issues.
-
Collaborate with DevOps and AI/ML teams to automate deployment and testing.
-
Implement monitoring, alerting, and incident response for AI systems.
-
Integrate and configure new Vertex AI and GenAI features.
-
Perform root cause analysis and maintain documentation for incident resolution.
-
Optimize resource usage and cost efficiency across AI workloads.
Required Qualifications:
-
Strong hands-on experience with Google Cloud Platform (GCP) and Vertex AI.
-
Proficiency in Python, scripting, and container orchestration (Docker, Kubernetes).
-
Familiarity with CI/CD pipelines, cloud monitoring tools (Stackdriver, Prometheus), and DevOps practices.
-
Understanding of machine learning frameworks (TensorFlow, PyTorch) and data pipelines.
-
Strong analytical, troubleshooting, and communication skills.
Nice to Have:
-
Experience with LLMs or GenAI on Vertex AI.
-
Background in MLOps, anomaly detection, or AI platform security.
-
Google Cloud Professional certification (ML Engineer or DevOps Engineer).