Job Description :
Lead Machine Learning Infrastructure Engineer Location : Mountain View, CA/ Dallas, TX / Chicago, IL / NYC, NY Duartion: Long term contract 10+ years of experience level required Candidates who can work independently are more preferred Required Skills: MLOps, TensorFlow, PyTorch, Terraform, Docker, Kubernetes, Prometheus, Grafana, ELK stack and any cloud certification. About the Role: We are seeking a highly skilled Lead ML Infrastructure Engineer to spearhead the development, deployment, and scaling of machine learning infrastructure. This pivotal role involves collaborating closely with data scientists, ML engineers, and operations teams to build robust, scalable, and efficient machine learning pipelines. The ideal candidate will be passionate about pushing the boundaries of ML infrastructure, and possess a deep understanding of cloud platforms, containerization, and big data technologies. Responsibilities: Lead the design, implementation, and maintenance of scalable ML infrastructure solutions. Collaborate with data science and ML teams to optimize model deployment workflows. Develop and manage CI/CD pipelines to automate deployment processes. Architect and implement containerized environments using Docker and Kubernetes. Ensure infrastructure security, reliability, and compliance across cloud platforms. Optimize resource utilization and cost-efficiency in cloud environments. Drive best practices in Infrastructure as Code (IaC) with tools like Terraform. Stay current with the latest advancements in ML frameworks, cloud services, and infrastructure tooling. Mentor junior team members and promote a culture of continuous improvement. Requirements: Proven experience as aMachine Learning Engineer or Infrastructure Engineer, with a focus on ML infrastructure. Strong expertise in programming languages Python and Java. Hands-on experience working with cloud platforms, with a strong preference for GCP; AWS and Azure experience are also valuable. Familiarity with popular machine learning frameworks such as TensorFlow and PyTorch, along with libraries like scikit-learn. Solid understanding of DevOps principles aMachine Learningnd experience with CI/CD pipelines. Experience with Infrastructure as Code tools, especially Terraform. Proficiency in containerization technologies including Docker and Kubernetes. Knowledge of big data processing tools like Apache Spark and Hadoop is highly preferred. Excellent problem-solving abilities combined with effective communication skills. Ability to work collaboratively in a fast-paced, dynamic environment. Preferred Qualifications: Master's or Bachelor's or Master's in Computer Science, Data Science, or related field. Certifications in cloud platforms (e.g., GCP Professional Cloud Architect, AWS Certified Solutions Architect Demonstrated experience leading a team or managing complex infrastructure projects. Contributions to open-source ML or DevOps projects. Experience with monitoring and logging tools such as Prometheus, Grafana, ELK stack.