Skip to content
mimi

AI DevOps / Infrastructure / Optimisation

Infomaniak Network SA

On-site €60k – €80k/yr 1w ago

About the role

About Infomaniak

Join Infomaniak, a technological leader where you'll be surrounded by top talent to create ethical and sovereign cloud and productivity solutions.

Infomaniak is the company behind SwissTransfer and a trusted partner for leading organizations: institutions like X, media such as France Télévisions, iconic events like X Jazz Festival and the Annecy Festival, as well as central banks, major cities, and security organizations across Europe.

An independent company, B Corp certified and awarded for its data centers that push the boundaries of efficiency and energy performance, Infomaniak is living proof that it's possible to build a different kind of digital: sovereign, sustainable, and beneficial for the local economy. Here, your passion will become a meaningful job: you will evolve with autonomy, take on real responsibilities, and contribute to projects that impact millions.

We are looking for an:

AI DevOps / Infrastructure Engineer

Infomaniak is developing open-source AI hosted on its own Swiss infrastructure. We deploy large-scale language models and build intelligent agents for our products (kMeet, kDrive). We are looking for an AI Engineer to design, implement, and optimize our AI agents, with a focus on quality, reliability, and user experience.

Responsibilities

  • Deployment & Orchestration:
    • Deploy, maintain, and optimize LLMs at scale, maximizing GPU resource efficiency.
    • Improve and industrialize our GitLab CI pipelines for AI models (build, test, deployment, rollback).
    • Pilot deployments via Flux CD (GitOps).
  • Monitoring & Observability:
    • Strengthen our Prometheus / Grafana / Victoria Metrics stack for fine-grained visibility on performance, GPU utilization, memory usage, availability, and overall service health.
    • Optimize resource usage.
  • Cost & Performance Efficiency:
    • Work on cost and performance efficiency (autoscaling, scheduling, quota management, resource optimization).
  • Reliability & Security:
    • Ensure the robustness, security, and reproducibility of deployments in a critical environment.

Your Profile

  • Mastery of modern serving frameworks (e.g., vLLM, TGI, Triton).
  • Mastery of GitLab CI (pipelines, runners, variables, etc.) with Kubernetes.
  • Proven experience in Kubernetes (Helm, CRDs, networking, autoscaling).
  • Experience with Flux CD (GitOps, HelmReleases, Kustomize, deployments).
  • Experience with Prometheus / Grafana (dashboards, alerting, exporters).
  • Knowledge of GPU infrastructures (NVIDIA, CUDA, GPU scheduling, monitoring).
  • Aptitude for quality, reliability, and performance.
  • Ability to work in a critical environment (high SLA, high availability).
  • Good ability to collaborate with ML and DevOps teams.

Bonus Points If You Have Knowledge In:

  • LangChain, Pydantic-AI, vLLM, FastAPI.
  • GitLab, Sentry, Qdrant.
  • Technical curiosity, a taste for innovative challenges, and open-source contributions or side projects are appreciated.
  • You enjoy working in a team and demonstrate a positive attitude.
  • Your humor, flexibility, and team spirit are essential for working in a fun environment.

Technical Stack We Use

  • LangChain
  • Pydantic-AI
  • vLLM
  • FastAPI
  • GitLab
  • Sentry
  • Qdrant

Offer

  • Permanent contract
  • Full-time (80-100%)
  • Location: Geneva
  • Availability: As soon as possible

Recruitment Process

  1. A first technical interview to validate your skills.
  2. A second interview at our offices.

Why Infomaniak

  • Be part of a company shaping an ethical cloud that respects privacy, people, and the environment.
  • Work every day in a supportive environment, balanced with your personal life, and extremely stimulating with cutting-edge professionals who are committed, attentive, and passionate about what they do.
  • Make a real difference in the lives of millions worldwide. At Infomaniak, we are all united by the desire to have real responsibilities and contribute to something bigger.
  • Meet diverse people in a friendly atmosphere during very regular company outings (Afterwork, ski trips, bike rides, theater, etc.).
  • Evolve in a pleasant workspace with an original setting, where meeting rooms are carefully decorated and foster creativity and collaboration.
  • Numerous other benefits such as an annual mobility bonus encouraging soft mobility, a fitness room to stay in shape, provision of electric bikes and scooters, convivial relaxation areas (rest room, video and arcade games, pinball, foosball, etc.), excellent accident and loss of earnings insurance, and other surprises.

At Infomaniak, we are committed to diversity, equity, and inclusion in the workplace. Our job offers are open to all, and all applications are evaluated on an equal footing, regardless of gender, origin, religion, sexual orientation, or disability.

Requirements

  • Maîtrise des frameworks modernes de serving (ex : vLLM, TGI, Triton).
  • Maîtrise de GitLab CI (pipelines, runners, variables, secrets).
  • Expérience confirmée en Kubernetes (Helm, CRDs, networking, autoscaling).
  • Expérience avec Flux CD (GitOps, HelmReleases, Kustomize, GitOps).
  • Expérience avec Prometheus / Grafana (dashboards, alerting, exporters).
  • Connaissance des infrastructures GPU (NVIDIA, CUDA, GPU scheduling, monitoring).
  • Appétence pour la qualité, la fiabilité et la performance.
  • Capacité à travailler en environnement critique (SLA élevé, haute disponibilité).
  • Bonne capacité à collaborer avec des équipes ML.

Responsibilities

  • Déployer, maintenir et optimiser des LLM en maximisant l'efficacité des ressources GPU.
  • Améliorer et industrialiser nos pipelines Gitlab CI pour les modèles IA (build, test, déploiement, rollback).
  • Piloter les déploiements via Flux CD (GitOps).
  • Renforcer notre stack Prometheus / Grafana / Victoria Metrics pour une visibilité fine sur les performances, la GPU, la latence et la disponibilité.
  • Travailler sur l'efficacité des coûts et des performances (autoscaling, scheduling, gestion des quotas, des écosystèmes).
  • Garantir la robustesse, la sécurité et la reproductibilité des déploiements dans un environnement critique.

Benefits

X annuelle incitant à la mobilité doucesalle de fitnessmise à disposition de vélos et trottinettes électriquesespaces de détente conviviauxexcellente couverture accident et perte de gain

Skills

CUDAFastAPIFlux CDGitLab CIGrafanaHelmKubernetesLangChainNVIDIAPydanticPrometheusQdrantSentryTritonVictoria MetricsvLLM

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free