Skip to content
mimi

AI Engineer – LLM Specialist ML, AI 🏆

AlpineAI AG

Davos · Hybrid Mid Level CHF 100k – CHF 130k/yr 2d ago

About the role

AI Engineer – LLM Specialist

Salary: CHF 100’000 - 130’000 per year

At AlpineAI AG we are looking for a ML, AI engineer!


Tech Stack

  • GitHub
  • Slack
  • Cursor
  • Machine-Learning

Requirements

What You Bring

  • AI / ML Experience

    • At least 3–5 years of experience in machine learning or applied AI.
    • Practical experience working with LLMs in production or advanced prototypes.
  • Model Training & Fine-Tuning

    • Experience with PyTorch or TensorFlow.
    • Familiarity with fine-tuning techniques and training pipelines.
  • Evaluation & Experimentation

    • Strong understanding of experimental design.
    • Experience building evaluation harnesses.
  • Programming Skills

    • Strong Python skills.
    • Familiarity with REST APIs and backend integration.
  • Data Handling & MLOps

    • Experience with dataset preprocessing, labeling pipelines, and versioning.
    • Familiarity with Docker, CI/CD, and model deployment.
  • Analytical Mindset

    • Ability to reason about model behavior and failure modes.
  • Communication

    • Good verbal and written communication in English and German.
  • Startup Mentality

    • Comfortable with ambiguity, fast iteration, and high ownership.

Responsibilities

Key Responsibilities

  • LLM Evaluation & Testing

    • Design and maintain systematic evaluation frameworks for LLMs, including:
      • Automated test suites, Golden datasets, Regression benchmarks
    • Define quantitative metrics (e.g., accuracy, latency, hallucination rate, task success) and qualitative evaluation protocols.
    • Perform error analysis and root‑cause investigations on model failures.
  • Task Alignment & Optimization

    • Focus on rapid prototyping and operationalization of customer use cases.
    • Improve model performance on specific tasks using a prompt‑first workflow (system prompts, few‑shot examples, tool instructions).
    • Build and iterate evaluation sets; run experiments to measure quality, latency, and cost.
    • Curate high‑signal datasets for automated prompt optimization (cleaning, labeling, filtering, augmentation).
    • Apply lightweight adaptation when beneficial (prompt tuning, parameter‑efficient methods like LoRA/adapters).
    • Use supervised fine‑tuning / instruction tuning when prompting and lightweight methods don’t reach the target.
    • Prepare and curate training datasets (cleaning, labeling, augmentation, filtering).
  • Model Selection & Experimentation

    • Evaluate and compare open‑source and commercial LLMs for specific use cases.
    • Design controlled experiments (A/B tests, offline evaluations).
    • Document results and recommend model choices.
  • Integration into Product

    • Collaborate with full‑stack engineers to integrate prototypes into product, backend services and user‑facing applications.
    • Support API design for model inference and post‑processing.
    • Ensure models behave reliably in real‑time and batch workflows.
  • Quality, Safety & Guardrails

    • Implement mechanisms to:
      • Reduce hallucinations
      • Enforce output formats
      • Apply content filters
      • Detect and handle unsafe or low‑confidence outputs
  • Performance & Cost Optimization

    • Optimize inference latency and throughput.
    • Balance model size, quantization, batching, and caching strategies.
    • Monitor and optimize inference costs.
  • MLOps & Lifecycle Management

    • Version models, datasets, prompts, and evaluation results.
    • Support deployment pipelines for new model versions.
    • Monitor model performance in production and detect drift.
  • Collaboration & Knowledge Sharing

    • Work closely with product managers to translate requirements into model behaviors.
    • Support internal teams with guidance on prompt design and model usage.
    • Contribute to documentation and internal best practices.
  • Dataset Strategy & Governance

    • Define standards for dataset quality, labeling guidelines, and storage.
    • Maintain traceability between datasets, experiments, and deployed models.
  • Synthetic Data Generation

    • Use LLMs or other techniques to generate synthetic training data where real data is scarce.
  • Agentic LLMs & Human‑in‑the‑Loop Workflows

    • Design and test LLM workflows that call tools, functions, or external APIs.
    • Design feedback loops where human reviewers validate or correct model outputs.
  • Research Scouting

    • Track relevant papers, frameworks, and open‑source projects.
    • Prototype promising techniques quickly.
  • Internal Enablement

    • Create internal guidelines for prompt writing and evaluation.
    • Run occasional knowledge‑sharing sessions.

Benefits & Perks

  • 5 weeks (25 days) vacation
  • Team social events
  • Equity or company stock
  • Home office / Remote 2 days per week
  • Hybrid work

About AlpineAI

  • Opportunity to participate in AlpineAI’s company shares program after initiation period.
  • Dynamic, innovation‑driven culture.
  • High autonomy and real product impact.
  • Close collaboration with experts in speech, NLP, and applied AI.
  • Exposure to cutting‑edge AI technologies.
  • On‑site role in Zurich or Davos

Learn more about AlpineAI at: https://alpineai.swiss


Application

Ready to help customers succeed with AI?

Apply now with your CV and a short cover letter. We look forward to hearing from you.


Location

Obere Strasse 22b, Chur, Switzerland


Salary

CHF 100’000 - 130’000 per year


Category

ML, AI Developer / Engineer


Don’t Apply If

  • You are not willing to work on‑site in Zurich or Davos.
  • You do not have a work permission for Switzerland.
  • You have never worked in a startup environment.

Additional Information

View this job and over 500 other transparent jobs with salaries (💰💰💰) & tech stacks (🛠️) on SwissDevJobs.

Are you looking for ML, AI jobs in Chur?

Requirements

  • At least 3–5 years of experience in machine learning or applied AI.
  • Practical experience working with LLMs in production or advanced prototypes.
  • Experience with PyTorch or TensorFlow.
  • Familiarity with fine-tuning techniques and training pipelines.
  • Strong understanding of experimental design.
  • Experience building evaluation harnesses.
  • Strong Python skills.
  • Familiarity with REST APIs and backend integration.
  • Experience with dataset preprocessing, labeling pipelines, and versioning.
  • Familiarity with Docker, CI/CD, and model deployment.
  • Ability to reason about model behavior and failure modes.
  • Good verbal and written communication in English and German.
  • Comfortable with ambiguity, fast iteration, and high ownership.

Responsibilities

  • Design and maintain systematic evaluation frameworks for LLMs, including: Automated test suites, Golden datasets, Regression benchmarks
  • Define quantitative metrics (e.g., accuracy, latency, hallucination rate, task success) and qualitative evaluation protocols.
  • Perform error analysis and root-cause investigations on model failures.
  • Focus on rapid prototyping and operationalization of customer use cases
  • Improve model performance on specific tasks using a prompt-first workflow (system prompts, few-shot examples, tool instructions).
  • Build and iterate evaluation sets; run experiments to measure quality, latency, and cost.
  • Curate high-signal datasets for automated prompt optimization (cleaning, labeling, filtering, augmentation).
  • Apply lightweight adaptation when beneficial (prompt tuning, parameter-efficient methods like LoRA/adapters).
  • Use supervised fine-tuning / instruction tuning when prompting and lightweight methods don’t reach the target.
  • Prepare and curate training datasets (cleaning, labeling, augmentation, filtering).
  • Evaluate and compare open-source and commercial LLMs for specific use cases.
  • Design controlled experiments (A/B tests, offline evaluations).
  • Document results and recommend model choices.
  • Collaborate with full-stack engineers to integrate prototypes into product, backend services and user-facing applications.
  • Support API design for model inference and post-processing.
  • Ensure models behave reliably in real-time and batch workflows.
  • Implement mechanisms to: Reduce hallucinations, Enforce output formats, Apply content filters, Detect and handle unsafe or low-confidence outputs
  • Optimize inference latency and throughput.
  • Balance model size, quantization, batching, and caching strategies.
  • Monitor and optimize inference costs.
  • Version models, datasets, prompts, and evaluation results.
  • Support deployment pipelines for new model versions.
  • Monitor model performance in production and detect drift.
  • Work closely with product managers to translate requirements into model behaviors.
  • Support internal teams with guidance on prompt design and model usage.
  • Contribute to documentation and internal best practices.
  • Define standards for dataset quality, labeling guidelines, and storage.
  • Maintain traceability between datasets, experiments, and deployed models.
  • Use LLMs or other techniques to generate synthetic training data where real data is scarce.
  • Design and test LLM workflows that call tools, functions, or external APIs.
  • Design feedback loops where human reviewers validate or correct model outputs.
  • Track relevant papers, frameworks, and open-source projects.
  • Prototype promising techniques quickly.
  • Create internal guidelines for prompt writing and evaluation.
  • Run occasional knowledge-sharing sessions.

Benefits

vacationTeam social eventsEquity or company stockHome office / Remote

Skills

AICI/CDDockerGitHubLLMLoRAMachine-LearningMLOpsNLPParameter-efficient methodsPythonPyTorchREST APIsSpeechTensorFlow

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free