AI Engineer – LLM Specialist ML, AI 🏆
AlpineAI AG
About the role
AI Engineer – LLM Specialist
Salary: CHF 100’000 - 130’000 per year
At AlpineAI AG we are looking for a ML, AI engineer!
Tech Stack
- GitHub
- Slack
- Cursor
- Machine-Learning
Requirements
What You Bring
AI / ML Experience
- At least 3–5 years of experience in machine learning or applied AI.
- Practical experience working with LLMs in production or advanced prototypes.
Model Training & Fine-Tuning
- Experience with PyTorch or TensorFlow.
- Familiarity with fine-tuning techniques and training pipelines.
Evaluation & Experimentation
- Strong understanding of experimental design.
- Experience building evaluation harnesses.
Programming Skills
- Strong Python skills.
- Familiarity with REST APIs and backend integration.
Data Handling & MLOps
- Experience with dataset preprocessing, labeling pipelines, and versioning.
- Familiarity with Docker, CI/CD, and model deployment.
Analytical Mindset
- Ability to reason about model behavior and failure modes.
Communication
- Good verbal and written communication in English and German.
Startup Mentality
- Comfortable with ambiguity, fast iteration, and high ownership.
Responsibilities
Key Responsibilities
LLM Evaluation & Testing
- Design and maintain systematic evaluation frameworks for LLMs, including:
- Automated test suites, Golden datasets, Regression benchmarks
- Define quantitative metrics (e.g., accuracy, latency, hallucination rate, task success) and qualitative evaluation protocols.
- Perform error analysis and root‑cause investigations on model failures.
- Design and maintain systematic evaluation frameworks for LLMs, including:
Task Alignment & Optimization
- Focus on rapid prototyping and operationalization of customer use cases.
- Improve model performance on specific tasks using a prompt‑first workflow (system prompts, few‑shot examples, tool instructions).
- Build and iterate evaluation sets; run experiments to measure quality, latency, and cost.
- Curate high‑signal datasets for automated prompt optimization (cleaning, labeling, filtering, augmentation).
- Apply lightweight adaptation when beneficial (prompt tuning, parameter‑efficient methods like LoRA/adapters).
- Use supervised fine‑tuning / instruction tuning when prompting and lightweight methods don’t reach the target.
- Prepare and curate training datasets (cleaning, labeling, augmentation, filtering).
Model Selection & Experimentation
- Evaluate and compare open‑source and commercial LLMs for specific use cases.
- Design controlled experiments (A/B tests, offline evaluations).
- Document results and recommend model choices.
Integration into Product
- Collaborate with full‑stack engineers to integrate prototypes into product, backend services and user‑facing applications.
- Support API design for model inference and post‑processing.
- Ensure models behave reliably in real‑time and batch workflows.
Quality, Safety & Guardrails
- Implement mechanisms to:
- Reduce hallucinations
- Enforce output formats
- Apply content filters
- Detect and handle unsafe or low‑confidence outputs
- Implement mechanisms to:
Performance & Cost Optimization
- Optimize inference latency and throughput.
- Balance model size, quantization, batching, and caching strategies.
- Monitor and optimize inference costs.
MLOps & Lifecycle Management
- Version models, datasets, prompts, and evaluation results.
- Support deployment pipelines for new model versions.
- Monitor model performance in production and detect drift.
Collaboration & Knowledge Sharing
- Work closely with product managers to translate requirements into model behaviors.
- Support internal teams with guidance on prompt design and model usage.
- Contribute to documentation and internal best practices.
Dataset Strategy & Governance
- Define standards for dataset quality, labeling guidelines, and storage.
- Maintain traceability between datasets, experiments, and deployed models.
Synthetic Data Generation
- Use LLMs or other techniques to generate synthetic training data where real data is scarce.
Agentic LLMs & Human‑in‑the‑Loop Workflows
- Design and test LLM workflows that call tools, functions, or external APIs.
- Design feedback loops where human reviewers validate or correct model outputs.
Research Scouting
- Track relevant papers, frameworks, and open‑source projects.
- Prototype promising techniques quickly.
Internal Enablement
- Create internal guidelines for prompt writing and evaluation.
- Run occasional knowledge‑sharing sessions.
Benefits & Perks
- 5 weeks (25 days) vacation
- Team social events
- Equity or company stock
- Home office / Remote 2 days per week
- Hybrid work
About AlpineAI
- Opportunity to participate in AlpineAI’s company shares program after initiation period.
- Dynamic, innovation‑driven culture.
- High autonomy and real product impact.
- Close collaboration with experts in speech, NLP, and applied AI.
- Exposure to cutting‑edge AI technologies.
- On‑site role in Zurich or Davos
Learn more about AlpineAI at: https://alpineai.swiss
Application
Ready to help customers succeed with AI?
Apply now with your CV and a short cover letter. We look forward to hearing from you.
Location
Obere Strasse 22b, Chur, Switzerland
Salary
CHF 100’000 - 130’000 per year
Category
ML, AI Developer / Engineer
Don’t Apply If
- You are not willing to work on‑site in Zurich or Davos.
- You do not have a work permission for Switzerland.
- You have never worked in a startup environment.
Additional Information
View this job and over 500 other transparent jobs with salaries (💰💰💰) & tech stacks (🛠️) on SwissDevJobs.
Are you looking for ML, AI jobs in Chur?
Requirements
- At least 3–5 years of experience in machine learning or applied AI.
- Practical experience working with LLMs in production or advanced prototypes.
- Experience with PyTorch or TensorFlow.
- Familiarity with fine-tuning techniques and training pipelines.
- Strong understanding of experimental design.
- Experience building evaluation harnesses.
- Strong Python skills.
- Familiarity with REST APIs and backend integration.
- Experience with dataset preprocessing, labeling pipelines, and versioning.
- Familiarity with Docker, CI/CD, and model deployment.
- Ability to reason about model behavior and failure modes.
- Good verbal and written communication in English and German.
- Comfortable with ambiguity, fast iteration, and high ownership.
Responsibilities
- Design and maintain systematic evaluation frameworks for LLMs, including: Automated test suites, Golden datasets, Regression benchmarks
- Define quantitative metrics (e.g., accuracy, latency, hallucination rate, task success) and qualitative evaluation protocols.
- Perform error analysis and root-cause investigations on model failures.
- Focus on rapid prototyping and operationalization of customer use cases
- Improve model performance on specific tasks using a prompt-first workflow (system prompts, few-shot examples, tool instructions).
- Build and iterate evaluation sets; run experiments to measure quality, latency, and cost.
- Curate high-signal datasets for automated prompt optimization (cleaning, labeling, filtering, augmentation).
- Apply lightweight adaptation when beneficial (prompt tuning, parameter-efficient methods like LoRA/adapters).
- Use supervised fine-tuning / instruction tuning when prompting and lightweight methods don’t reach the target.
- Prepare and curate training datasets (cleaning, labeling, augmentation, filtering).
- Evaluate and compare open-source and commercial LLMs for specific use cases.
- Design controlled experiments (A/B tests, offline evaluations).
- Document results and recommend model choices.
- Collaborate with full-stack engineers to integrate prototypes into product, backend services and user-facing applications.
- Support API design for model inference and post-processing.
- Ensure models behave reliably in real-time and batch workflows.
- Implement mechanisms to: Reduce hallucinations, Enforce output formats, Apply content filters, Detect and handle unsafe or low-confidence outputs
- Optimize inference latency and throughput.
- Balance model size, quantization, batching, and caching strategies.
- Monitor and optimize inference costs.
- Version models, datasets, prompts, and evaluation results.
- Support deployment pipelines for new model versions.
- Monitor model performance in production and detect drift.
- Work closely with product managers to translate requirements into model behaviors.
- Support internal teams with guidance on prompt design and model usage.
- Contribute to documentation and internal best practices.
- Define standards for dataset quality, labeling guidelines, and storage.
- Maintain traceability between datasets, experiments, and deployed models.
- Use LLMs or other techniques to generate synthetic training data where real data is scarce.
- Design and test LLM workflows that call tools, functions, or external APIs.
- Design feedback loops where human reviewers validate or correct model outputs.
- Track relevant papers, frameworks, and open-source projects.
- Prototype promising techniques quickly.
- Create internal guidelines for prompt writing and evaluation.
- Run occasional knowledge-sharing sessions.
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free