Senior DevOps Engineer
Cellebrite
About the role
About the Role
We are building a rapidly scaling GenAI-powered SaaS platform that enables investigators to interact with complex case data through a conversational AI interface. Our system leverages RAG architecture and agentic GenAI workflows to deliver advanced AI capabilities in production.
We are looking for a Senior DevOps / Cloud Engineer to own our application services, cloud infrastructure, deployment pipelines, and production reliability in this dynamic AI environment.
This is a hands-on role focused on serverless architecture, LLM-based systems, and agentic workflows, working closely with Engineering and Customer Success to ensure the platform is reliable, scalable, and cost-efficient.
Key Responsibilities
- Own and manage application services running on GCP infrastructure, including serverless and managed services
- Design and maintain robust CI/CD pipelines for rapid, safe deployments
- Operate and optimize GenAI/LLM workloads in production, including RAG pipelines and agentic workflows
- Monitor and improve latency, cost, and reliability of AI-driven systems
- Troubleshoot complex production issues across application, data, and infrastructure layers
- Work with and optimize BigQuery-based data workflows, queries, and performance
- Support and debug multi-step AI pipelines and agent orchestration flows
- Implement and maintain observability (logging, metrics, tracing, alerting), including for AI pipelines
- Collaborate with engineering teams on architecture improvements for evolving GenAI systems
- Partner with Customer Success to investigate and resolve customer-impacting issues (minimal direct customer interaction)
- Enforce security and best practices in a sensitive data environment
What Were Looking For
- A senior engineer who can own production systems end-to-end
- Strong problem-solver with the ability to debug complex, non-deterministic AI systems
- Comfortable working in a rapidly evolving GenAI and agentic architecture
- Pragmatic mindset balancing performance, cost, and reliability
- High ownership and ability to work independently
Why Join Us
- Build and scale a real-world GenAI product with meaningful impact
- Work on cutting-edge challenges involving LLMs, RAG, and agentic systems
- Be part of a small, fast-moving, high-impact innovation team
Office Location:
Remote
Qualifications:
- 5+ years of experience in DevOps / SRE / Cloud Engineering
- Strong hands-on experience with Google Cloud Platform (GCP)
- Proven experience with serverless architectures (Cloud Run, Cloud Functions, or similar)
- Experience working with BigQuery (querying, performance tuning, troubleshooting)
- Experience running and supporting production SaaS applications
- Hands-on experience with GenAI / LLM-based applications in production
- (including RAG systems, model APIs, or similar)
- Experience supporting or operating multi-step AI pipelines or agentic workflows
- Strong experience with CI/CD pipelines (GitHub Actions, etc.)
- Solid scripting/programming skills (Python, TypeScript, Bash, or similar)
- Experience with observability and monitoring tools
Preferred Qualifications
- Experience optimizing LLM performance, cost, and reliability at scale
- Familiarity with vector databases, embeddings, and retrieval systems
- Experience with infrastructure as code (Terraform or similar)
- Background in secure or regulated environments
- Experience in fast-scaling or experimental product environments
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free