Software Engineer
Salesforce, Inc.
About the role
About the Role
We are looking for Software Engineers to join the ML Infrastructure focus area and help architect and operate the core systems that power AI at Slack. In this role, you will own foundational infrastructure for large scale model training and inference, and evolve it into a reliable, secure, and self service platform used across the company.
You will work at the intersection of distributed systems, GPU infrastructure, and modern ML stacks, solving complex scalability and reliability challenges. This role blends deep systems engineering with a strong understanding of the ML lifecycle, and plays a critical part in shaping the long term technical foundations of Slack's AI capabilities.
What You Will Be Doing
- Design, build, and operate systems to train, serve, and deploy machine learning models at scale, with a focus on reliability, performance, and operational simplicity
- Evolve GPU backed inference infrastructure to support high throughput, latency sensitive workloads, including large scale model serving
- Architect and optimize distributed training and data processing systems using platforms such as Ray, Airflow, Spark, or similar technologies
- Build and maintain Kubernetes based platforms and orchestration layers using tools such as KubeRay, vLLM, and internally developed services
- Architect solutions that bridge legacy systems with modern technologies while maintaining monolithic application stability
- Develop robust monitoring, observability, and alerting for production ML workloads to ensure operational excellence
- Partner closely with AI Platform, ML modeling, security, and product engineering teams to design infrastructure that supports evolving AI use cases
- Provide technical leadership through design reviews, mentorship, and by setting engineering standards and long term architectural direction for ML infrastructure
- Author technical design and architecture documentation, and contribute thought leadership through engineering blog posts
- Build and ship high-quality, production-grade software using modern engineering practices, with AI as a core part of your development workflow by pushing the boundaries of AI development tools to deliver secure, optimized, and high-quality code.
- Design and orchestrate complex systems where AI agents integrate seamlessly into human workflows, driving efficiency and innovation at scale.
- Contribute to building and maintaining the shared system context, an explicit repository of system designs, constraints, and standards that enables AI to operate accurately and reliably.
- Critically evaluate code (Human or AI-generated) for correctness, quality, security, and performance
What You Should Have
- Significant professional experience in software engineering with a strong focus on infrastructure, backend systems, platform engineering, or MLOps
- Deep experience building and operating distributed systems, including expert level knowledge of Kubernetes and container based platforms
- Hands on experience with modern ML infrastructure and serving stacks such as Ray or KubeRay, vLLM, or similar training and inference orchestration frameworks
- Experience working with GPU infrastructure, including performance optimization and operational management at scale
- Strong experience with data infrastructure and orchestration technologies such as Airflow, Spark, or similar systems
- Experience building and operating cloud native systems on public cloud platforms such as AWS, GCP, or Azure, including infrastructure as code
- A demonstrated ability to drive technical direction for complex systems and balance short term delivery with long term architectural goals
- Excellent written communication, as well as ability to thrive in an asynchronous and globally distributed infrastructure team.
- A related technical degree required
- A demonstrated, genuine AI-first approach to engineering. Using AI to move faster, build fluency across the stack, and contribute well beyond your core specialty.
- Experience using AI tools (e.g., Claude Code, GitHub Copilot, Codex, Cursor, etc.) in development workflows
- Advanced prompt engineering skills and the ability to write precise, structured prompts and cultivate the system context that makes AI outputs reliable, secure, and production-ready.
Unleash Your Potential
When you join Salesforce, you'll be limitless in all areas of your life. Our benefits and resources support you to find balance and be your best, and our AI agents accelerate your impact so you can do your best. Together, we'll bring the power of Agentforce to organizations of all sizes and deliver amazing experiences that customers love. Apply today to not only shape the future - but to redefine what's possible - for yourself, for AI, and the world.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free