WS
AI DevOps Engineer
Wall Street Consulting Services LLC
Warren · On-site Full-time Senior Today
About the role
Job Summary
We are seeking a highly skilled AI DevOps Engineer to support and enhance AI/ML platform operations, cloud infrastructure automation, CI/CD pipelines, and MLOps practices for MSIG. The ideal candidate will have strong expertise in DevOps, cloud platforms, containerization, infrastructure automation, and AI/ML deployment pipelines. This role will collaborate closely with Data Scientists, ML Engineers, Software Developers, and Infrastructure teams to operationalize scalable AI solutions.
Key Responsibilities
AI/ML Platform & MLOps
- Design, implement, and maintain scalable AI/ML infrastructure and MLOps pipelines.
- Automate model deployment, retraining, monitoring, and versioning processes.
- Manage end-to-end ML lifecycle including model packaging, deployment, and production support.
- Integrate ML workflows with CI/CD pipelines for seamless deployment.
- Support model governance, monitoring, drift detection, and rollback mechanisms.
DevOps & Cloud Engineering
- Build and manage CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI/CD, or Azure DevOps.
- Automate infrastructure provisioning using Terraform, CloudFormation, or ARM templates.
- Manage Kubernetes clusters and containerized applications using Docker and Kubernetes/OpenShift/EKS/AKS/GKE.
- Implement Infrastructure as Code (IaC) and configuration management best practices.
- Ensure high availability, scalability, and reliability of AI applications.
Cloud & Infrastructure
- Work with cloud platforms such as AWS, Azure, or GCP.
- Configure and maintain cloud-native AI services and compute resources.
- Implement monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, Datadog, or CloudWatch.
- Optimize infrastructure performance and cloud costs.
Security & Compliance
- Implement DevSecOps best practices for AI environments.
- Ensure compliance with enterprise security standards and regulatory requirements.
- Manage IAM, secrets management, vulnerability scanning, and container security.
Collaboration & Support
- Collaborate with AI/ML teams to productionize machine learning models.
- Troubleshoot deployment and infrastructure issues across environments.
- Participate in architecture discussions and operational planning.
- Provide production support and incident resolution.
Required Skills
Technical Skills
- Strong experience with DevOps and MLOps practices.
- Expertise in:
- Docker
- Kubernetes/OpenShift
- Jenkins / GitHub Actions / GitLab CI
- Terraform / IaC tools
- Linux Administration
- Python or Shell scripting
- Experience with AI/ML deployment frameworks:
- MLflow
- Kubeflow
- SageMaker
- Vertex AI
- Azure ML
- Cloud experience in AWS, Azure, or GCP.
- Experience with monitoring/logging tools:
- Prometheus
- Grafana
- ELK Stack
- Splunk
- Knowledge of networking, security, and cloud architecture.
AI/ML Knowledge
- Understanding of machine learning workflows and model lifecycle.
- Experience deploying AI/ML models into production environments.
- Familiarity with LLMOps / Generative AI deployment is a plus.
- Exposure to vector databases, GPU workloads, and AI inferencing platforms preferred.
Preferred Qualifications
- Experience in Insurance or Financial Services domain.
- Knowledge of Data Engineering pipelines and streaming platforms like Kafka.
- Experience with GPU infrastructure and AI acceleration platforms.
- Familiarity with Responsible AI and AI governance frameworks.
- Relevant certifications in AWS/Azure/GCP or Kubernetes preferred.
Skills
AWSAzureCloudFormationDatadogDockerELK StackGCPGitLab CIGitHub ActionsGrafanaIaCJenkinsKubernetesLinuxMLflowMLOpsOpenShiftPrometheusPythonShell scriptingSplunkTerraformVertex AIAWS LambdaAzure DevOpsAzure MLCloudWatchEKSGKEAKSKubeflowSageMaker
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free